NoSQL meet SQL

Wednesday, November 11, 2009 - Jerry Sievert

Often times, I don't get to get my hands dirty at work. Not being one to let myself atrophy, I keep my eyes out for new and exciting things to catch my fancy, and spend hours and hours writing new code: usually reinventing the wheel, often times poking and prodding, just trying to figure out what I'm going to do with what I find.

One of the projects that caught my eye a bit over a year ago was CouchDB, a RESTful document storage engine, that happens to have Map/Reduce support. Being the database freak that I am, I started thinking about all of the projects I've worked on in the past that could have been improved with a document model over pseudo-relational databases. So many came to mind, and I was excited about the flexibility of CouchDB; so useful for so many things, especially with strong data analysis abilities via map and reduce.

Then came a day when all of my meetings were cancelled, and instead I sat in front of a large display with a group of other people running query after query, analyzing large amounts of data -- SQL query after SQL query. By the end of the day I was exhausted and feeling just a little bit dirty. Never one to let good dirt go to waste, I sat down with a beer in hand and started scheming.

Enter an idea: what if I could marry the ease of SQL to the flexibility of CouchDB? Actually, it was more like: what's the most amusing thing I could build after a day like today?

I'm a big fan of perl: there's nothing perl can't do, and with a wonderful repository like CPAN throwing together a script that would make the devil himself shiver becomes an easy task. Sure, perl can be so messy and complex that it can't even be parsed, but like any language, under the right circumstances it can really shine.

An hour later, I had the beginnings of a masterpiece, a bridge between the worlds of SQL and NoSQL. It was the worst abuse of technology that I could think of: a SQL front-end to CouchDB.

Something like this should never be used in production. In fact, it should probably never be used at all. It works by parsing the SQL statement, does some quick sanity checks, and builds a map function to be called with as a temporary view. These temporary views are then queried, and the response from CouchDB is printed to the terminal.

There are a couple of prerequisites, all available from CPAN:

  • Getopt::Long
  • SQL::Statement
  • LWP::UserAgent
  • HTTP::Request
  • JSON

Then, just download and "enjoy": couchsql.pl

Some quick examples:

miniBook:~ jerry$ ./couchsql.pl "SELECT * FROM stock"
{"total_rows":9611,"offset":0,"rows":[
{"id":"01-02-2009:A","key":null,"value":{"_id":"01-02-2009:A","_rev":"1-28512862
2","month":1,"date":"2009/01/02","quarter":1,"high":"16.3000","close":"16.2400",
"day":2,"id":"01-02-2009:A","company":"AGILENT TECH INC","symbol":"A","epoch":"1
230854400","volume":"3030200","day_of_year":2,"day_of_quarter":2,"open":"15.6000
","type":"STOCK","year":2009}},
{"id":"01-02-2009:AA","key":null,"value":{"_id":"01-02-2009:AA","_rev":"1-170752
1035","month":1,"date":"2009/01/02","quarter":1,"high":"12.4400","close":"12.110
0","day":2,"id":"01-02-2009:AA","company":"ALCOA INC","symbol":"AA","epoch":"123
0854400","volume":"30645200","day_of_year":2,"day_of_quarter":2,"open":"11.3600"
,"type":"STOCK","year":2009}},
[...]
]}

miniBook:~ jerry$ ./couchsql.pl "SELECT symbol, open, close, high, date FROM stock WHERE date = '2009/03/01' ORDER BY symbol"
{"total_rows":98,"offset":0,"rows":[
{"id":"03-03-2009:A","key":null,"value":{"symbol":"A","open":"12.8900","close":"12.6800","high":"13.0600","date":"2009/03/03"}},
{"id":"03-03-2009:AA","key":null,"value":{"symbol":"AA","open":"5.8500","close":"5.5300","high":"5.9800","date":"2009/03/03"}},
{"id":"03-03-2009:ABT","key":null,"value":{"symbol":"ABT","open":"47.3500","close":"45.8200","high":"47.5800","date":"2009/03/03"}},
{"id":"03-03-2009:ACN","key":null,"value":{"symbol":"ACN","open":"28.1300","close":"27.8000","high":"29.6700","date":"2009/03/03"}},
[...]
]}

Will I take this any further? Who knows. Limit needs to be added to select, as well as functions. There is also potential for updates and deletes, as well as simple create/drop functionality. We shall see.