Friday, September 10, 2010

SQLite with stdev (standard deviation) and more!

At the moment I am collecting data - hoards of data - my test suite is a veritable cornucopia of boring data. Normally I would be logging such data into flat files, collecting some giga bytes of log files that I would then need to write a handful of Python, Perl, or Ruby scripts to make any sense of. I have done this a thousand times before ... and I'm getting tired of it!

In comes my new best friend - SQLite. This small relational-database-in-a-file is excruciatingly easy to use from almost any programming language you could think of. So instead of writing a lot of data handling scripts, this time I have written only one - a script that imports the log data into an SQLite database. All other questions about the data, and how the different test results relate to each other, can thus be formulated in simple SQL instead of through increasingly complex Python scripts.

"When are you coming to the stdev part!", you say? Calm down, I'm getting there. Working with my data in a relational database I found that there were only one piece of functionality that I was missing, and that was the possibility to select out both an average _and_ the standard deviation of a data set. Luckily, this is possible in SQLite through a simple extension.

This is what you do:

  1. Fetch the source code for the extension here [direct link]

  2. Compile it:
    Mac$ gcc -fno-common -dynamiclib extension-functions.c -o libsqlitefunctions.dylib
    Linux$ gcc -fPIC -lm -shared extension-functions.c -o libsqlitefunctions.so

  3. Copy the resulting library to some location where you want to store it. In my case that would be $HOME/opt/lib

  4. Before executing your select statements be sure to load the extension in SQLite by issuing the following command (within SQLite):
    select load_extension('/full/path/to/libsqlitefunctions.dylib');


And you're done! Now you can issue select statements with stdev as an aggregate function. E.g.:
SELECT name, stdev(seconds) 
FROM benchmarks
GROUP BY name


Note: The SQLite version included in Mac OS X 10.6 does not allow extensions, so you will have to build it yourself. That is luckily easily done - fetch the source here.