Accessing ‘R’ from C#–Lessons learned

In my family we say, “If no one took a picture, it didn’t happen.” I’m getting to be that way somewhat with various software technologies. Some little voice inside me says, “If I can’t call it from C#, it can’t be a real technology.” That’s not true of course, but it helps explain my recent struggle to figure out how to call the analytic engine ‘R’ from C#. Of course I did have a business reason…

The R community is pretty vibrant, to a point. If you want help with analytic esoterica, there are tons of forums and news groups. But if you want to think about incorporating R into a commercial environment, the road is less traveled. (Sorry for the mixed metaphor.)

I did manage to get it to work. I started with this 2008 article The R Statistical Language and C#.NET: Foundations by Jeff Cromwell. That gave me the basic idea of how to connect. The stack I used includes:

· R 2.11.1 32-bit for Windows (Windows 7 in my case)

· The matching rscproxy package

· The R-(D)COM Interface

The current version of R is 2.12.0. I started there but had no luck getting my simple C# program to connect to R. I eventually found a posting that says the DCOM interface does not work with the 2.12.0 version of R. Once I backtracked and got 2.11.1 and the matching rscproxy package, I got the connection to work.

Here is the recipe:

1. Download R 2.11.1 here.

2. When you install R, make sure to check the box that saves the version number in the registry. You’ll see it in the second or third dialog of the install routine. It seems to be checked by default.

3. Download the 2.11 version of rscproxy here.

4. Unzip the rscproxy package somewhere you can find later. Within that folder you will find an ‘rscproxy’ folder. Copy that folder to your library folder under the R directory. For me, that is C:\Program Files (x86)\R\R-2.11.1\library. Adjust for your installation as appropriate.

5. Down load the DCOM Interface from here.

6. Install the DCOM Interface by running the exe you downloaded in step 5.

7. At this point I started R and used the library () command to be sure rscproxy was in the library. I further used library (rscproxy) to make sure I could load it. If your version of rscproxy is newer than your version of R, you will find out at this point.

8. The DCOM Interface installs a simple test routine at “C:\Program Files (x86)\R\(D)COM Server\samples\Simple\simple.exe” (on my machine.) You’ll find it in your start menu as well with the name “Server 01 – Basic Test”. If you run this, you will see a GUI with a ‘Start R’ button. If that works, you’re good to go. If it does not, make sure all the versions align. Remember, 2.12.0 does not seem to work at this point (11/18/2010). Make sure you can load the proxy from within R.

I’m sure there are other ways to do this; I’m interested in learning about other approaches, especially in terms of performance and robustness.

Once the test program worked, I wrote a simple C# console program based on the sample in the Cromwell article. The only thing I had to puzzle out, slightly, was adding the reference to the libraries. It’s here:

clip_image002

That gives you the key entry points to establish a connection, set and get R symbols and cause evaluation. In other words, this assembly has the implementation of IStatConnector. There are other libraries to invoke R’s graphics engine and other features. I have not tried these yet.

The one change I made in the Cromwell’s sample code is to use ‘var’ instead of ‘object’. C# circa 2008 did not include support for var, at least I don’t think it did.

This code works: 

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using StatConnectorCommonLib;
using STATCONNECTORSRVLib;
 
namespace R_TestClient
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                StatConnector rConn = new StatConnector();
 
                rConn.Init("R");
 
                rConn.SetSymbol("n1", 20);
                rConn.Evaluate ("x1<-rnorm(n1)");
                var o = rConn.GetSymbol ("x1");
 
                foreach (double d in o)
                    Console.WriteLine(d);
 
                rConn.Close();
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
 
            Console.WriteLine("Press ENTER to exit.");
            Console.ReadLine();
        }
    }
}

I’m off to learn R and try more substantial processing. Let me know, via comments, if I have any of this wrong or if you found better ways to do the same things. Thanks!

My next steps:

· Try more complicated R processing.

· Understand how to connect to remote instances of R.

· Test scale (data size) and performance.

Posted in Technology | 9 Comments

I.T. Can Run But I.T.Can’t Hide (from Microsoft Excel)

On May 14, I had the privilege of co-presenting with Andrew Brust to the NYC Chapter of TDWI and at the NY Tech Council BI and Analytics special interest group. Our topic was “I.T. Can Run, But I.T. Can’t Hide (from Microsoft Excel)”. We didn’t pick the title!

The slides are here. This posting is more-or-less the speaking notes for the slides. Andrew did three great demos; I will summarize them below.

Conventional wisdom holds that end-users in corporations adore Excel and IT departments hate it. We believe the situation is far from black-and-white. We see many reasons end-users love Excel. Most of these are obvious including the ease of building an appropriate and relevant solution, quickly and without interference. Obvious reasons IT departments worry about wide-spread use of Excel include fears about data quality, data staleness and security.

We found that users have their own fears about using Excel to serve their own needs independently of IT and its data sources. They definitely have to work harder to find the data they need for decision making. They have to build models and calculations, perhaps from scratch. And responsible employees worry about making mistakes and potentially losing data through email or file shares.

Counter to the conventional wisdom, IT benefits from users serving their own needs with Excel or other tools. In a world of do-more-with less and hyper-competitiveness among companies, IT is pained by being the bottleneck that holds back the creativity and innovation of employees.

The slides provide more detail. The thing that surprised Andrew and me is this; we expected everything that users love about using Excel to be the opposite of the things that IT hates about it. If you’ve ever built a 2×2 grid to enumerate the pros and cons of two paths, A and B, you’ve often seen the pros for A are the cons for B and vice-versa. In our case we were surprise, and delighted, to realize that end-users and IT share similar concerns about data quality, modeling accuracy and working together. This being the case, we found several strategies that companies can deploy to get the empowerment that users (& companies) want and the alignment the company requires.

A key strategy is enabling self-service, giving end-users access to quality data and a tool they enjoy using and expecting them to satisfy their own needs for analytics. An observation of ours is that when employees get home, that is out of the office, they largely serve their own information and information integration needs. The Facebook generation, and indeed anyone who is facile using the web, is used to working with different data sources and “mashing-up” their own, usually simple, solutions. We have email in Hotmail or Gmail, photos in Flickr, contacts in LinkedIn, friends in Facebook, etc. We track our money in Quicken or Mint online. We get our maps from Google or Bing. And mostly we don’t build apps as much as occasionally cut-and-paste.

Having end-users with Web 2.0 experience bodes well for implementing self-service at the office. But the flip side is the Facebook generation has ideas about sharing that can give IT pause. The first worry is about employees publishing company data or secrets in social media. That’s largely a cultural issue. Things my generation takes for granted are not second-nature for some younger generations.

Beyond the leakage problem, the other potential negative is a willingness of many people to ask their friends for information and to trust that data more than the data they get from “authorities”. Studies have found that 78 percent of people trust peer recommendations[1] while only 14 percent trust traditional advertising[2]. The scenario IT fears is this; Joe needs some data about the square footage of all of the stores in his district. He takes a stab at getting the data from SAP, but fails. So he sends a few IMs to people in his work-network and finds out that Bob has an old email from the Real Estate team that lists the square footages. So Joe copies that data into his spreadsheet and never looks back.

So the dilemma for IT is how to encourage self-service, but using corporate data. In his demos, Andrew started by showing how difficult it is for users to directly access corporate databases. They need to think about servers, tables and views, queries, etc. It’s a non-starter for most employees. They will generally go around IT if this is the only level of data support IT provides.

In his second demo, Andrew showed what I call the brick o’data approach. When we built SQL Server Reporting Services in the early 2000’s, we observed that one person’s report was another person’s data source. We invented (& patented) an idea called report-as-data-source. SQL Server 2008 R2 supports this, serving data inside reports as Open Data (http://www.odata.org/producers) feeds. But what about companies that have not yet adopted Excel 2010 (required for consuming OData) or SQL R2?

We found out that since 1997 users could import data from web pages straight into Excel. Excel will scrape data out of HTML tables. All you need is a URL. You can set a refresh period so you always have fresh or relatively fresh data. The brick o’data approach uses simple web apps to publish views of data in plain HTML tables. They are not meant for humans as much as for Excel web queries. You can find more information here: http://bit.ly/OldSchoolXL. The approach we recommend is to build a well-know site inside your firewall with as many links to views as you think are reasonable. Keeping this inside the firewall mitigates most security concerns, or at least allows you to address them the same way as any other data security challenges.

Finally Andrew demonstrated PowerPivot and Open Data working together to provide fresh, secure data and amazing BI functionality in Excel. Much has been written about PowerPivot, so I won’t repeat that here.

Finally, we discussed five strategies for companies to use to cope with end-user self-service AND data quality:

· Transparency & Attribution

· Self-service

· Sharing

· Publishing

· Model analytics

These range from mostly cultural (transparency) to mostly technical (model analytics). The slides have more details.

Overall, we found some optimism for companies trying to span the spectrum of empowerment to alignment. And we had fun building and delivering the presentation. Thanks to Jon Deutsch, President of the Tri-City TDWI chapter for putting the program together.


[1] July 2009 Nielsen Global Online Consumer Survey

[2] Marketing to the Social Web,” Larry Weber, Wiley Publishing  2007

Posted in Business Intelligence, Open Data, Technology | Leave a comment

iPad Take 2: Revolution or Evolution?

Having used the iPad for several days now, I believe it is currently an evolution of something, not a revolutionary new thing. But it will become revolutionary if the right applications emerge. Today I think the sweet spot is replacing the iPod Touch for the 50+ generation. The iPad is the best media player I’ve seen yet combined with a great eBook function. Movies are crisp, clear and large. The sound is great. The iPod functionality benefits from the additional screen real estate both in terms of information layout and type size. (I’m one of those 50+ people.)

I downloaded the Kindle application from Amazon and grabbed my Kindle books. I also bought an iBook from Apple. Both are fantastic; I prefer either to my actual first generation Kindle.

The iPad is also a cool TV/movie companion; it’s fun to quickly look up actors or other movies and shows while you watch the actual, real television. (I’m one of those 50+, ADD people.)

The iPad keyboard is simply not good enough for me to write much more than a quick email. So it won’t replace my laptop.

So, as a media player and as a super convenient web browser, the iPad is evolutionary. The revolution comes when compelling apps emerge that use touch to reach new uses and new users. Home automation would be fun and useful, and easier to learn for many, if based on touch gestures instead of mouse gestures. I believe my non-computer relatives could pick up a touch-based application to program their thermostats or set their lawn sprinklers. Running the media center, or even just programming the DVR, would benefit from the touch model. A point in support is the Sonos iPhone application that so many Sonos customers prefer to the normal Sonos controller. (Granted, the non-computer types among us probably don’t have a Sonos either.)

Some of these applications already exist for the iPhone and iPod Touch. So you could argue that even there, the iPad is an evolution of that platform. You’d be effectively saying, I think, that touch is the revolution. I won’t argue; I think that is true. I also think the iPad is the form factor that will drive touch-based applications deeper into the home and maybe businesses. I think the current price is a bit high, but expect that to fall over time.

Regardless, I love mine. I can see owning an iPhone for, well, talking and apps on the go, and an iPad for media and applications around the home or office. (I’m one of those 50+ people, ADD, has-a-Verizon-contract people.)

Posted in Apple, iPad, Technology | 1 Comment

First impressions of the iPad

The purchasing experience was top-notch.  I reserved mine a few weeks ago online.  The store opened at 9am and reserved units were held until 3pm.  When we arrived, an Apple Store employee took our name, signed us in using his iPhone and within seconds someone else came up, introduced himself and took us into the store.  We were done in less than ten minutes.

The machine itself is very slick.  It does feel good in your hand – that hard to define sense of quality.  It’s nice to open a consumer device that is already charged.  Nothing kills the moment like that note in the manual that say, “Charge completely before using.”  We took ours to Starbucks and were online within a few moments.  Web browsing is pretty easy, the screen is excellent, great colors, pages load reasonably quickly, but slower than on my laptop.

The keyboard is as so-so as I’ve read in most if not all of the reviews.  I accessed my email through OWA and responded to one mail quickly.  But I would never write a long document or email using the onscreen keyboard.

I downloaded one video to check out movies.  I was impressed.  Great color, no flickering, lots of resolution.  Onboard sound seems good, will check out headphones later.  (No headphones in the box.)

The charging and sync cable is specific to the iPad.  They told us that iPod cords won’t work.  That’s a bummer – and the cable is unmarked, so the potential for mix up is there.  I will have to label this one.

Early conclusions:

  • This won’t replace my laptop.  I code too much and type too much to make due with an iPad only while traveling around town or out of town.
  • It won’t replace the laptop I keep open on the kitchen counter for checking email periodically.
  • It will sit by the TV, it’s perfect for looking stuff up while watching shows or a movie.  Instant-on is sweet.
  • It will replace my iPod touch when I want to watch downloaded TV shows while working out.
  • I will take it on an airplane at least once for the same reason.  We’ll see if carrying this and a laptop is at all practical. 

Pundits are debating how revolutionary or evolutionary the iPad really is.  My going-in assumption was evolutionary for techies and revolutionary for non-techies.  I think there is still a lot of tech know-how needed to set up and use an iPad.  I’m going to enjoy mine, even if just for casual web browsing and watching TV shows while working out.  Thinking about my relatives who don’t use computers or use them lightly, I’m not sure the iPad, out-of-the-box anyway, will increase their usage.  I’m looking forward to showing them the iPad and seeing their reactions.

Posted in Apple, iPad, Technology | Leave a comment

Welcome!

Looks like a blog, smells like a blog… it’s a blog.  My little corner of that inter-webs thing everyone is always talking about.  I haven’t totally worked out what I will do here.  At minimum it will contain links to articles I write and slide decks I deliver.  As time permits, I’ll post some thoughts on things that matter to me, mainly around sofware-as-a-service, cloud computing and business intelligence.  I’m allowing comments, but in the near-term will approve them before they post.  I may open that up at some point, once I’m sure I can control the SPAM reasonably well.  You’re always welcome to reach me at the email address on my ‘About’ page.  I’m pretty responsive on email.  Enjoy your day!

A few ‘blog-y’ details… I host with Dreamhost, the blog engine is WordPress, the designer is my son Andrew.  I’m pretty impressed with all three.

Posted in Blog | Leave a comment