I would like to introduce Rincanter, a binding for Clojure/Incanter to the R language for statistical computing. This is mostly a fairly thin layer over the existing Java/R bindings done by the folks at rosuda.org. Why did I write this? Thanks to the hard work of Rich Hickey, David Liebke and others, you can already do impressive statistical data-mining tasks using only Clojure. However, the R project has a huge body of work of libraries and datasets that the much smaller Incanter community won't be able to match, at least in the short term. Unless…we can provide an easy to use bridge that would allow us to work mostly in Clojure and break into the R cookie jar for its datasets and function libraries when we need to. So, the goals for this project are:
-
Short Term:
- Provide access to the vast datasets available in R to Clojure/Incanter users. This requires that we can convert between R data types and Clojure/Incanter data types. For the most part this is working.
- Provide access to the large body of R packages and function libraries to fill in the gaps where Clojure and Incanter don't have functional coverage. This is partially working, but there are probably quite a few places where the Clojure side and the R side just won't match up without some serious fudging.
-
Long Term
- Provide a scaffold for porting R packages, functions and datasets over to what many people believe is a stronger base language. While R is an impressive language in many ways, even some of its Founders think that a full featured lisp could be a better way forward for basing an interactive statistical environment on. I would like people to strongly consider Clojure for that position.
A quick walkthrough
This will show a quick example showing how we can access R datasets available remotely on CRAN and import them into Incanter. We will be interacting with Incanter and R inside a REPL session.
To start with, you will need to get Rincanter up and running. There are some fairly detailed instructions for doing this on the project Home Page.
$ cd /path/to/where/you/downloaded/rincanter $ lein repl
You should now have a REPL running with the required classes and packages loaded. Now we are ready for an interactive session. This will just be a very simple example showing how we can access R datasets available remotely on CRAN and import them into Incanter.
user=>(use '(incanter core stats charts)) user=>(use '(com.evocomputing rincanter)) ;;just ensure that we can connect to the embedded R engine user=>(get-jri-engine) ;;Now let's load an R package from CRAN ;;Picking "season" package at random user=>(r-install-CRAN "season") The downloaded packages are in /var/folders/ZZZZZXXXXXYYYY4/downloaded_packages nil ;;The package has been installed, now load the library user=>(r-eval "library(season)") ["season" "coda" "survival" "splines" "mgcv" "MASS" "lattice" "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base"] ;;Load the CVDdaily dataset user=>(r-eval "data(CVDdaily)") ;;OK- we can now access/convert this data to Incanter! (def *CVDdaily* (r-get "CVDdaily")) ;;It is now data owned by Incanter, do what you want with it user=> (col-names *CVDdaily*) ["date" "cvd" "dow" "tmpd" "o3mean" "o3tmean" "Mon" "Tue" "Wed" "Thu" "Fri" "Sat" "month" "winter" "spring" "summer" "autumn"] user=> (with-data *CVDdaily* (mean ($ :cvd))) 45.110481032459916 (with-data *CVDdaily* (def lm (linear-model ($ :cvd) ($ :o3mean))) (doto (scatter-plot ($ :o3mean) ($ :cvd)) (add-lines ($ :o3mean) (:fitted lm)) view))
As you can see, it's fairly easy to grab any existing package and dataset on CRAN, download it, and pull the data into Incanter.