PostgreSQL is an amazing and extensible database, providing a ton of functionality. One of the best parts, in my opinion, is the ability to add additional programming languages to create stored procedures. This allows developers to move business logic deeper into the database itself. Unfortunately, this is often very hard to test in isolation.
The current state of testing in PostgreSQL is to run
which runs a myriad of SQL commands living in files in the
with the results checked against a bunch more files living in the
directory. This works, but is often-times difficult to add into existing
test suites, which ultimately leaves holes in testing.
All databases consist of a handful of operations: Create, Read, Update, Delete, and sometimes Scan. If this sounds a lot like reading and writing files on a disk, it is. At their core, databases need access to both data and metadata. An operating system provides this in a very basic form of a filesystem. As a simple approach to building a database, the reliance on a filesystem is key, as it provides a lot of metadata, and allows us to move forward with one less layer to deal with.
It's silly, really: rebuilding one of the things that has been considered a "solved problem" for a number of years. But, I've always felt that in order to understand a problem fully, one must understand the problems that those who came before us were trying to solve; to take a trip through their design decisions, and to understand why they chose the solutions that they did.
An ORM (Object-Relational Mapping) has become a near-essential tool of software development. Whether you agree with the model or not, it has become ubiquitous. So, what happens when your ORM is so generic that it can't actually deal with the advanced features of your database? Problem: impedance mismatch. How bad can it be? Really bad, and the workarounds can be just as bad, if not worse.
There comes a time in every community where members of that community must step back and take a look at how they appear to behave to those outside of them. The operative word is appear, and that's the part I want to focus on.
In the first installment we dealt with creating collections and deep inspection of the JSON object once it was inserted. In this installment, we will be covering saving the data and building WHERE clauses from MongoDB queries in order to retrieve the data that we've written.
I had a crazy thought. Don't all good ideas start with that phrase? Well, this one was suitably crazy: why not build my own version of MongoDB right on top of Postgres? It sounds a little far-fetched, but in all honesty it's pretty simple.
It's been a few months since I released Bricks.js, and I figured it was finally time to talk about it. Bricks is a fast, and extremely modular web application framework built on top of Node.js that works a little differently.
I had been meaning to spend some time with Judy Arrays but I hadn't quite found a good reason to explore them to their full extent. While attending NodeConf I caught Marco Rogers' talk on C++ bindings for Node.js which gave me a fantastic reason to spend some time in the Judy world. A few days later I came up with this project: Judy Arrays in Node.js. Unfortunately, it's been about half of a decade since my last foray into C++, and at least a decade and a half before that via academia, so while all attempts have been made to adhere to best standards of Node.js add-on development, I cannot guarantee that everything is 100% correct and that there are no memory leaks.
It's not every day that I find myself in a conundrum -- not just any conundrum, but a moral one. I rarely think of computers and software in the terms of morality: right and wrong, good and bad, but instead the expression of ideas, a beautiful manifestation of thought. This time things are different.
Let me back up a little bit. It was the height of the WikiLeaks release of the Iraq War Logs and I was outraged. Typically with outrage comes the desire to do something about it: this was no exception. I figured that I could do something about it, but I wasn't willing to risk my own hide by hosting a copy of the data on my own servers. Laws here in the US are fickle. I could be in the clear, but still end up on some watch list. I could legally be OK, but if my travel is suddenly impeded, that's a bad thing. I may be idealistic, but when it comes to the possibility of losing my livelihood or being put on some sort of watch list I tend to take a step back.
It was summer and I was craving pork. Not just any pork, but Tails and Trotters pork -- fed with hazelnuts and absolutely delicious. I somehow convinced my lovely partner-in-crime to split the cost of half of a pig; there was only one problem, we had a freezer but its pedigree was entirely unknown. Rather than take a chance on losing a whole lot of yummy a plan was hatched: we'd bring the freezer into the 21st century (or at least the monitoring of it).
Introducing my first Github repository: node-date-utils. During redevelopment of my freezer daemon (more to come later), I found a couple of missing Date methods. This is an attempt to fill some of them in.
When needing two or more fairly disparate systems to work together seamlessly, having complete flexibility at the database level can be a blessing.
Take for instance the problem of a ten year old legacy system hosting millions of accounts, and an up to date content management system that needs complete access to that data as if it were its own. You can manage multiple systems with complicated triggers, methods for moving data around, expensive joins, funky stored procedures, hacks to the code, or you can simply use a writable view.
SQL is everywhere. Believe it or not there are legacy relational "schema-with" databases filled with data all over the internet. Chances are even your own office has at least one SQL database lurking in a closet somewhere.
So, how do you leverage your existing "schema-with" databases and still be able to use the power of Map/Reduce? Introducing MR SQL: A Map/Reduce Front-End to SQL.
Often times, I don't get to get my hands dirty at work. Not being one to let myself atrophy, I keep my eyes out for new and exciting things to catch my fancy, and spend hours and hours writing new code: usually reinventing the wheel, often times poking and prodding, just trying to figure out what I'm going to do with what I find.
One of the projects that caught my eye a bit over a year ago was CouchDB, a RESTful document storage engine, that happens to have Map/Reduce support. Being the database freak that I am, I started thinking about all of the projects I've worked on in the past that could have been improved with a document model over pseudo-relational databases. So many came to mind, and I was excited about the flexibility of CouchDB; so useful for so many things, especially with strong data analysis abilities via map and reduce.
Starting a new job is always difficult. Coming up to speed on essential projects, finding your niche, even remembering everyones names. Then there's the added challenge of starting a job at a partner of Google. Backing up a bit, my new job was approached by Google as both a data provider, and a partner in their new Social App section of iGoogle.
They say that a picture is worth a thousand words. Unfortunately, it's also worth 1000 bytes or more. In my quest to build the ultimate game of solitaire, I needed cards, 52 of them to be exact. That's a lot of bandwidth for something as simple as cards. I decided to try to fix this problem by making my cards using XHTML and CSS.