JSON v JSONB: A Dive into PostgreSQL

Thursday, March 21, 2019 - Jerry Sievert

JSON, or JavaScript Object Notation, is a lightweight data interchange format that is easy for both computers and humans to deal with. The rules are pretty simple:

  • a JSON document is an object
  • an object consists of a key and a value
  • a key is a string
  • a value can be a string, number, boolean, array, object or null

So, when dealing with data, JSON has become fairly popular: it's the default document storage type for databases like CouchDB and MongoDB (though MongoDB technically uses a JSON-like format), and available in PostgreSQL as the JSON and JSONB data types.

So, why two data types? And which should you use? The first answer is pretty straightforward, and dives a little bit into the history of PostgreSQL.

The JSON data type was first introduced in version 9.2, released in September of 2012, and is a thin veneer around the TEXT type: when inserting or updating a JSON typed field, validation of the document occurs to make sure that it is valid. Otherwise, the document itself is not very complicated. Early JSON operations were limited to converting to and from arrays, but were expanded fairly quickly to include operators to retrieve elements of a document by key as TEXT.

The JSONB data type was introduced later, in version 9.4, released in December of 2014. It is based on the HSTORE data type, which is a single-dimensional key/value type. JSONB expanded on HSTORE by adding another dimension via the addition of arrays and objects. Where HSTORE was only able to be flat, JSONB is able to be as deep as you need, embedding objects in arrays inside of objects inside of array, as deep and complex as you need the document to be. In addition, JSONB adds operators that allow for manipulation of JSONB data, such as || for concatenation, - and #- for deletion. Together, these make JSONB into a much more powerful data type than JSON.

So, why use JSON when JSONB exists? Isn't it better? Well, the answer to that can be complicated, based on your use cases.

Let's first take a look at storage. Let's say we have a 30k JSON document (28168 bytes to be exact): storing this as TEXT is pretty straightforward, but what happens when we store it as JSON? Storing 1,000 copies ends up being 5,660,672 bytes. Let's compare that to JSONB: 6,340,608 bytes. That's just over 1.12 times the amount of storage to gain the ability to do simple direct manipulation and searches using operators.

What other tradeoffs are we making? Let's take a look at query times. It's easy to execute queries by creating some simple benchmarks.

For the record, and so you can play along at home, you can find a GitHub repository with all of the benchmark code, tables, and stored procedures for setting everything up on GitHub. Note that all tests were run on a Macbook Pro running MacOS, against PostgreSQL 10.5.

Let's go ahead and put a single record into a storage table:

=# INSERT INTO source_document (doc) VALUES (json_obj()::TEXT);

And do some benchmarking for insertions:

=# SELECT test_json_insert(10000, doc::JSON) FROM source_document;

On my laptop, that gives me a solid 5787.718 ms (or just over 5 seconds). That's not too bad, given what we're doing: casting a TEXT value as JSON, which is validating the TEXT as a full JSON document, 10,000 times.

Let's compare that to the same thing in JSONB:

=# SELECT test_jsonb_insert(100000, doc::JSONB) FROM source_document;

This runs in 9725.226 ms (or just under 10 seconds). This isn't quite as good, at almost twice the amount of time to insert, but whether or not it matters comes down to how intend to use our data.

Speaking of using our data, let's take a look at how we would be able to retrieve our data. First, we'll create a simple test query in PL/PGSQL to help normalize any query times:

CREATE OR REPLACE FUNCTION plbench(query TEXT, n INT) RETURNS VOID AS $$
BEGIN
        FOR i IN 1 .. n LOOP
                EXECUTE query;
        END LOOP;
END;
$$ LANGUAGE plpgsql;

This allows us to execute a query some large number of times and get a fairly accurate representation of the average amount of time to execute a single query, and the aggregate of all of the queries is quite helpful.

Since we'll be running and discarding the results, but want to make sure that we're doing a realistic test of the amount of time a real query would take to output to some client (either psql or the language driver of your choice), we're going to CAST our results to TEXT. Let's again start with JSON, remembering that we should still have 10,000 rows waiting for us:

=# SELECT plbench('SELECT doc::TEXT FROM test_json', 20);

A nice round 20 (remember, this comes out to 200,000 rows) gives me 11728.708 ms (or just under 12 seconds). Let's go ahead and compare that to JSONB:

=# SELECT plbench('SELECT doc::TEXT FROM test_jsonb', 20);

Another 20 (or 200,000 rows), but not nearly as fast at 43733.030 ms (or just under 44 seconds). This is nearly one quarter the speed as using straight JSON.

So far, there has yet to be a compelling reason to use JSONB over JSON, but we've only just started looking at the possibilities. We've inserted, we've done document level querying, but what about direct querying of the data itself?

Let's start out with JSON, again sticking with 20 iterations, since we're dealing with 10,000 documents:

=# SELECT plbench('SELECT doc -> ''_id'' FROM test_json', 20);

This returns in 38927.170 ms (just under 39 seconds). Not great, but how will JSONB fare?

=# SELECT plbench('SELECT doc -> ''_id'' FROM test_jsonb', 20);

7504.617 ms (7.5 seconds)! Huzzah! Now we're starting to see where JSONB can shine. Because the JSON object has been converted into a data structure that is easy to iterate through, finding or modifying elements of the object should be much faster, as we are seeing.

There are a ton of other operations that JSONB gives us, that are not available to JSON objects without either casting, or using a stored procedure in a language that understands JSON natively. For these tests, I've chosen PLV8, specifically version 3.0alpha.

First let's look at concatenation. JSONB has an operator available ||. This time, instead of casting to TEXT, we're just going to be benchmarking the operation directly, as it can be used for UPDATE or DELETE, as well as SELECT. Let's start with JSONB:

=# SELECT plbench('SELECT doc || ''{ "foo": "bar" }''::JSONB FROM test_jsonb', 20);

Again, we're sticking to 20 iterations, since we're dealing with 10,000 documents, which finishes in 41390.995 ms (just over 41 seconds). We can do the same thing with JSON by casting:

=# SELECT plbench('SELECT doc::JSONB || ''{ "foo": "bar" }''::JSONB FROM test_json', 20);

This finishes in 96797.946 ms (just under 97 seconds), taking nearly 2.5 times longer. This is pretty significant.

=# SELECT plbench('SELECT plv8_concat_jsonb(doc, ''{ "foo": "bar" }''::JSONB) FROM test_jsonb', 20);

This is not fast in comparison. Completing the 20 iterations takes 348327.778 ms (over 5 minutes).

Just to be complete, let's do the same test against JSON using PLV8:

=# SELECT plbench('SELECT plv8_concat_json(doc, ''{ "foo": "bar" }''::JSON) FROM test_json', 20);

Again, not fast, clocking in at 131241.480 ms (2 minutes and 11 seconds), but much better than the same functionality in PLV8 using JSONB, but still slower than casting JSON to JSONB, or just using JSONB to begin with. But, remember that JSONB operators are useful for the most simplistic cases, so you may need to resort to stored procedures to get the most out of both JSON and JSONB.

Now that you know a little more about JSON and JSONB and how they both perform in PostgreSQL, you can make some data backed decisions on which is more appropriate for your day to day usage. Of course no article like this would be complete without a picture of performance, so here goes. Hopefully it was worth reading until the end to see it.