R is the current hotness in statistics. I may as well use it, as 2/3 of all statistical algorithms I’ve run into of late are implemented in it. (Of those that remain, most of the rest are written for MATLAB, which is, IMO, some kind of weird con job pulled on the maths community by disgruntled scientific computation graduates who want to double bill you for the use of your own floating point unit.)
Many things about R are surprising to me, coming as I do most recently from Python. I’m documenting my perpetual surprise here, in order that it may save someone else the inconvenience of going to all that trouble to be personally surprised.
Importing an R package, unlike importing a python module, brings in random cruft that may have little to do with the names of the thing you just imported:
> library("np")
Nonparametric Kernel Methods for Mixed Datatypes (version 0.40-4)
> npreg
function (bws, ...)
{
args <- list(...)
etc
Data structures in R can do, and are intended to, provide first class scopes for looking up of names. You are, as apt of your explorations into data to bring the names of columns in a data set into scope just as much as the names of functions in a library. This is kind of useful, although the scoping proceedings do make my eyes water when this intersects with function definition.
Formulas are cool and ugly, like Adult Swim, and intimately bound up in the prior point.
I need to learn the R terminology to describe this.
R fosters a style of programming where attributes and metadata of data objects are set by using accessor functions, e.g. in matrix column naming:
> m=matrix(0, nrow=2,ncol=2)
> m
[,1] [,2]
[1,] 0 0
[2,] 0 0
> colnames(m)
NULL
> colnames(m)=c('a','b')
> colnames(m)
[1] "a" "b"
> m
a b
[1,] 0 0
[2,] 0 0
If you want to know by observing its effects whether an apparent function returns some massaged product of is argument, or whether it decorates the argument, well, check the manual. As a rule, the accessor functions operate on one object and return null, although so can, e.g., plot functions.
You makes vectors by using a call to a function called c. Witness:
> c('a', 'b', 'c', 'd')
[1] "a" "b" "c" "d"
If you just literally type a vector in, it will throw an error:
> 'a', 'b', 'c', 'd'
Error: unexpected ',' in "'a',"
In short, a powerful, effective, diverse, well-supported language which makes the statistician in me wriggle his toes in pleasure even as the computer scientist in me winces.