Often times, I don't get to get my hands dirty at work. Not being one to let myself atrophy, I keep my eyes out for new and exciting things to catch my fancy, and spend hours and hours writing new code: usually reinventing the wheel, often times poking and prodding, just trying to figure out what I'm going to do with what I find.
One of the projects that caught my eye a bit over a year ago was CouchDB, a RESTful document storage engine, that happens to have Map/Reduce support. Being the database freak that I am, I started thinking about all of the projects I've worked on in the past that could have been improved with a document model over pseudo-relational databases. So many came to mind, and I was excited about the flexibility of CouchDB; so useful for so many things, especially with strong data analysis abilities via map and reduce.
Then came a day when all of my meetings were cancelled, and instead I sat in front of a large display with a group of other people running query after query, analyzing large amounts of data -- SQL query after SQL query. By the end of the day I was exhausted and feeling just a little bit dirty. Never one to let good dirt go to waste, I sat down with a beer in hand and started scheming.
Enter an idea: what if I could marry the ease of SQL to the flexibility of CouchDB? Actually, it was more like: what's the most amusing thing I could build after a day like today?
I'm a big fan of perl: there's nothing perl can't do, and with a wonderful repository like CPAN throwing together a script that would make the devil himself shiver becomes an easy task. Sure, perl can be so messy and complex that it can't even be parsed, but like any language, under the right circumstances it can really shine.
An hour later, I had the beginnings of a masterpiece, a bridge between the worlds of SQL and NoSQL. It was the worst abuse of technology that I could think of: a SQL front-end to CouchDB.
Something like this should never be used in production. In fact, it should probably never be used at all. It works by parsing the SQL statement, does some quick sanity checks, and builds a map function to be called with as a temporary view. These temporary views are then queried, and the response from CouchDB is printed to the terminal.
There are a couple of prerequisites, all available from CPAN:
- Getopt::Long
- SQL::Statement
- LWP::UserAgent
- HTTP::Request
- JSON
Then, just download and "enjoy": couchsql.pl
Some quick examples:
miniBook:~ jerry$ ./couchsql.pl "SELECT * FROM stock"
{"total_rows":9611,"offset":0,"rows":[
{"id":"01-02-2009:A","key":null,"value":{"_id":"01-02-2009:A","_rev":"1-28512862
2","month":1,"date":"2009/01/02","quarter":1,"high":"16.3000","close":"16.2400",
"day":2,"id":"01-02-2009:A","company":"AGILENT TECH INC","symbol":"A","epoch":"1
230854400","volume":"3030200","day_of_year":2,"day_of_quarter":2,"open":"15.6000
","type":"STOCK","year":2009}},
{"id":"01-02-2009:AA","key":null,"value":{"_id":"01-02-2009:AA","_rev":"1-170752
1035","month":1,"date":"2009/01/02","quarter":1,"high":"12.4400","close":"12.110
0","day":2,"id":"01-02-2009:AA","company":"ALCOA INC","symbol":"AA","epoch":"123
0854400","volume":"30645200","day_of_year":2,"day_of_quarter":2,"open":"11.3600"
,"type":"STOCK","year":2009}},
[...]
]}
miniBook:~ jerry$ ./couchsql.pl "SELECT symbol, open, close, high, date FROM stock WHERE date = '2009/03/01' ORDER BY symbol"
{"total_rows":98,"offset":0,"rows":[
{"id":"03-03-2009:A","key":null,"value":{"symbol":"A","open":"12.8900","close":"12.6800","high":"13.0600","date":"2009/03/03"}},
{"id":"03-03-2009:AA","key":null,"value":{"symbol":"AA","open":"5.8500","close":"5.5300","high":"5.9800","date":"2009/03/03"}},
{"id":"03-03-2009:ABT","key":null,"value":{"symbol":"ABT","open":"47.3500","close":"45.8200","high":"47.5800","date":"2009/03/03"}},
{"id":"03-03-2009:ACN","key":null,"value":{"symbol":"ACN","open":"28.1300","close":"27.8000","high":"29.6700","date":"2009/03/03"}},
[...]
]}
Will I take this any further? Who knows. Limit needs to be added to select, as well as functions. There is also potential for updates and deletes, as well as simple create/drop functionality. We shall see.
Comments (8)
Add Comment
Sam Johnston - Wed, Nov 11, 2009
http://samj.net/Thanks for again proving how utterly ridiculous the NoSQL moniker is. This is great work.
Sam
Damien Katz - Wed, Nov 11, 2009
http://damienkatz.netCool stuff! I don't think it's a crazy idea, I've often thought CouchDB views could be used as tables for SQL queries.
The crazy part is using temp views! But if you limited yourself to existing views, SQL queries with joins, etc should be entirely possible and efficient (until you start to get into a partitioned setup, then the joins are painful).
Derek Anderson - Wed, Nov 11, 2009
http://armyofevilrobots.com/Oh look? I can replace a nice ORM friendly json output with the same old SQL DSL that we all know and loathe!
Neat POC (which the author clearly has a sense of humor about), but I hope people don't jump with both feet into actually using this as a production solution.
Now perhaps someone should write an ORM on top of CouchSQL so that we can come full circle ;)
Jerry Sievert - Wed, Nov 11, 2009
Derek:don't tempt me.
Jason Huggins - Wed, Nov 11, 2009
http://saucelabs.comYou, sir, are one evil genius.
Sam Johnston - Wed, Nov 11, 2009
http://samj.net/However fugly, making SQL scale or making SQL work with "NoSQL" databases is the fast track to widespread adoption. Expecting people to throw away decades of work to do things your way (however elegant) just isn't going to work.
Sam
Alaric Snell-Pym - Thu, Nov 12, 2009
http://www.geniedb.com/Here at GenieDB, we've written a MySQL table storage engine that backs onto our distributed "NoSQL" database, so that you can 'import' one of our tables into MySQL (or onto lots of totally independent MySQL servers in a cluster) - which achieves the sort of thing you crave: MySQL does all the SQL parsing, and handles joins (even with other kinds of tables) and sorting and functions and grouping-by and all that stuff - and we just handle storing records.
It's a lot of work implementing a MySQL storage engine, but it means MySQL does a great deal of work for us - and the end result is that people can use SQL (and even MySQL's dialect of it) to talk to a replicated fault-tolerant dynamically-scalable self-healing database.
And we're looking for beta users! Get in touch at www.geniedb.com if you're interested!
Arun Srinivasan - Fri, Nov 13, 2009
myagentnotes.blogspot.comI seriously don't know why ppl are offensive on this marriage, just fun right!! Anyway, just found how un-intelligent ppl are , to compare sql to no-sql databases,its like the freakonomics book pics, tearin an orange from an apple.
SQL Relational DBs are here to stay, to process the big grand data intensive ACID rquired and very integrity dependent apps, which forms major chunk of data processing.
The funny new guys like twitter and facebook who don't care if their tweets appear in a min or 5 can use couch or mongo, these are works of seriously intelligent ppl aimed for different requirements.
Anyway, good read, will try to run your code against my mongo, just havin fun!! yeah!! we ar nerds