The Living Thing

R, the language

R is the current hotness in statistics. I may as well use it, as 2/3 of all statistical algorithms I’ve run into of late are implemented in it. (Of those that remain, most of the rest are written for MATLAB, which is, IMO, some kind of weird con job pulled on the maths community by disgruntled scientific computation graduates who want to double bill you for the use of your own floating point unit.)

Pros and cons

Good

  • combines unparalleled breadth and flexibility, and community, at least as far as statisticians, data miners, machine learners and other such assorted folk as I am pleased to call my colleagues. To get some sense of this thriving scene, check out R-bloggers. This community alone is enough to sell R, whatever you think of the language. (cf “Your community is your best asset”)
  • amazing, statistically-useful plotting (cf, e.g., the awful battle to get error bars in mayavi)

Bad

  • Seems, from my personal aesthetic, to have been written by a team who prioritise delivering statistical functionality right now over making an elegant, lean or consistent language to access that functionality. I’d much rather access those same beautiful libraries through a language which has had as many computer scientists winnowing its ugly bits as Python or Ruby has had. Or Smalltalk. (And, for that matter, as many amazing third-party libraries for non-statistical things as these other languages promise.) Examples of warts:
    • Parser and syntax weirdness like random scope.
    • Call-by-value semantics (in a big-data processing language?)
    • ...ameliorated not even by array views.
    • Object model tacked on after the fact, which is fine, but...
    • ...in that case I’d at least like it to be a proper functional language with regard to modern, efficient, functional data structures. Nah.
  • One of the worst names to google for ever (cf Processing, Pure)

R for Pythonistas

Many things about R are surprising to me, coming as I do most recently from Python. I’m documenting my perpetual surprise here, in order that it may save someone else the inconvenience of going to all that trouble to be personally surprised.

Opaque imports

Importing an R package, unlike importing a python module, brings in random cruft that may have little to do with the names of the thing you just imported:

> library("np")
Nonparametric Kernel Methods for Mixed Datatypes (version 0.40-4)
> npreg
function (bws, ...)
{
    args <- list(...)
etc

Lookup

Data structures in R can do, and are intended to, provide first class scopes for looking up of names. You are, as apt of your explorations into data to bring the names of columns in a data set into scope just as much as the names of functions in a library. This is kind of useful, although the scoping proceedings do make my eyes water when this intersects with function definition.

Formula types

Formulas are cool and ugly, like Adult Swim, and intimately bound up in the prior point.

assignment to function calls

I need to learn the R terminology to describe this.

R fosters a style of programming where attributes and metadata of data objects are set by using accessor functions, e.g. in matrix column naming:

> m=matrix(0, nrow=2,ncol=2)
> m
     [,1] [,2]
[1,]   0   0
[2,]   0   0
> colnames(m)
NULL
> colnames(m)=c('a','b')
> colnames(m)
[1] "a" "b"
> m
     a b
[1,] 0 0
[2,] 0 0

If you want to know by observing its effects whether an apparent function returns some massaged product of is argument, or whether it decorates the argument, well, check the manual. As a rule, the accessor functions operate on one object and return null, although so can, e.g., plot functions.

No scalar types...

A float is a float vector of size 1:

> 5
[1] 5

...yet weird vector literal syntax

You makes vectors by using a call to a function called c. Witness:

> c('a', 'b', 'c', 'd')
[1] "a" "b" "c" "d"

If you just literally type a vector in, it will throw an error:

> 'a', 'b', 'c', 'd'
Error: unexpected ',' in "'a',"

In short, a powerful, effective, diverse, well-supported language which makes the statistician in me wriggle his toes in pleasure even as the computer scientist in me winces.


blog comments powered by Disqus