When Your Postgres Database and ORM Collide: Partitioning
Thursday, March 26, 2015An ORM (Object-Relational Mapping) has become a near-essential tool of software development. Whether you agree with the model or not, it has become ubiquitous. So, what happens when your ORM is so generic that it can't actually deal with the advanced features of your database? Problem: impedance mismatch. How bad can it be? Really bad, and the workarounds can be just as bad, if not worse.
Let's talk about where things can break down: partitioning. Partitioning is
a very specific optimization for databases; tables are broken up into subtables
that store the data, and can be queried separately either directly or via a
CHECK
constraint that confines the query to a specific table. Partitioning
with Postgres uses a base table, and tables that inherit from that table,
along with a trigger that puts data where it needs to go. It's a rather manual
process, but it's extremely powerful and allows for a lot of options.
Setting Up
Let's look at a specific example - create the primary table:
CREATE TABLE t ( id PRIMARY KEY SERIAL, p NUMERIC, value VARCHAR );
And create the partitioned tables, using p
as the partition key:
CREATE TABLE t1 (CHECK (p = 1)) INHERITS (t);CREATE TABLE t2 (CHECK (p = 2)) INHERITS (t);
We have created two (extra) tables, but on insert we need to be able to put the
data where it needs to be. This requires a TRIGGER
(which requires a function
to run when the INSERT
happens):
CREATE OR REPLACE FUNCTION t_insert_trigger() RETURNS TRIGGER AS $$ BEGIN IF ( NEW.p = 1) THEN INSERT INTO t1 VALUES (NEW.*); ELSIF ( NEW.p = 2 ) THEN INSERT INTO t2 VALUES (NEW.*); ELSE RAISE EXCEPTION 'p out of range. It needs to be 1 or 2!'; END IF; RETURN NULL; END; $$ LANGUAGE plpgsql;
So here's the trigger:
CREATE TRIGGER trigger_t BEFORE INSERT ON t FOR EACH ROW EXECUTE PROCEDURE t_insert_trigger();
From here, whenever we insert something into the table t, it goes into the
correct table (or errors if p
doesn't contain 1 or 2, details right?).
The Problem
Postgres's trigger partitioning has very specific requirements that direct its
behavior. We are specifically intercepting the INSERT
and changing where the
data lives in the database. This is extremely powerful, but the requirement is
to either return NULL
, which causes the database to just continue on and not
do anything further (which, since we are inserting into other tables is
generally what we want), or if we return a value it gets inserted into the
parent table (whoa, suddenly we have two copies: one in the table we want it
to be in, and one in the parent table that we don't want it in).
During a normal INSERT
, this is fine, but most ORMs try to be smart. They
insert the data into the table on your behalf, and attempt to get the results
of the INSERT
back for you. Considering that the database can make changes
to the data, typically add a unique identifier, and do other things, this
smart actually turns into smart.
There's a pretty straightforward pattern for ORMs to use with Postgres when
trying to get the latest changes on an INSERT
: RETURNING *
:
INSERT INTO t (p, value) VALUES (1, 'hello world') RETURNING *;
Without partitioning, this would return something like:
id: 5, p: 1, value: 'hello world'
The ORM takes this, and converts it into a model, and all is good. Since we
are intercepting the INSERT
, and inserting the data into the correct table,
and returning NULL
, the ORM gets nothing back. Often times, this causes a
bunch of additional inserts, which could cause other issues. Remember, we
are no longer returning the data that we inserted, so the results are NULL
.
Working Around the Problem
Let's talk about working around the problem. There's a fairly straightforward
way that we could work through this. We could override the insert method of
whatever ORM we are using and try to be smart about overriding our
RETURNING *
to correctly return the latest record inserted via a TRANSACTION
.
The standard response is to create a TRIGGER
that inserts the data, then
create an after trigger that deletes the extra row. Whoa. That's a lot of
extra work (see here for more
details).
That's a lot of work, so let's look at how we should be able to improve that.
We can begin a TRANSACTION
, do the INSERT
, and SELECT
the latest ID
by using CURRVAL()
:
INSERT INTO t (p, value) VALUES (1, 'hello world'); SELECT * FROM t WHERE id = CURRVAL('t_id_seq');
Or, if your ORM is using a language like ruby and sequel:
execute_insert(sql) tn = self.model.table_name.to_s returning_sql = "SELECT * FROM #{tn} WHERE id=currval('#{tn}_id_seq');" returning_fetch_rows(returning_sql, &block)
This sucks. No, really, it's not good. Querying against CURRVAL
executes
a SEQ SCAN
against all of the tables of the partition scheme. If we have
25 tables, that means 25 parallel queries reading everything off of disk and
scanning through it (this is also sometimes known as a full table scan).
When you're dealing with millions of rows of data, this becomes extremely
expensive. Seriously, don't use CURRVAL()
, you would think that it should
be in memory, but it's not: you'll be watching it scan millions of rows of
data, possibly seconds per INSERT
.
So, let's look at something that could be better:
tn = self.model.table_name.to_srow = DB.fetch("SELECT nextval('#{tn}_id_seq') AS next;").server(:default).first id = row[:next]
values[0][:id] = id
sql = insert_sql(*values)
execute_insert(sql)
returning_sql = "SELECT * FROM #{tn} WHERE id=#{id};" returning_fetch_rows(returning_sql, &block)
It's a lot more work, but here we're getting the NEXTVAL()
of the sequence,
assigning it to the id and doing the INSERT
from there. It's a lot more
work, and requires a big view into both the database and the code, but
ultimately means a huge improvement overall.
In our case, it meant a change from a constant CPU load of 25%, to a CPU load of less than 2%. This is significant.
By understanding the full path of how data is dealt with, we can make major optimizations that can have big impacts on costs and infrastructures.
ORMs are stupid (and so are databases), but they don't have to be. A little bit of view into both your ORM and your database can have a massive change in both performance and cost.