How to index and query a JSON object in PostgreSQL®?

January 6, 2023 - 4 minutes read - 729 words

PostgreSQL® offers two types of data types to handle JSON data, JSON and JSONB. This doc showcases how to index a JSONB column with a GIN index.

Some of the examples are based on this blog.

NOTE: To review the differences between JSON and JSONB check out the related article.

NOTE: more info is available in the PostgreSQL JSON functions documentation page

👉 Want a FREE PostgreSQL database?👈
🦀 Check Aiven’s FREE plans! 🦀

⚡️ Want to optimize your SQL query with AI? ⚡️
Check Aiven AI Database Optimizer!
Powered by EverSQL, it provides you index and SQL rewrite recommendations to take your database performance to the next level

The dataset

The dataset is the following:

{
    "id": 778,
    "shop": "Luigis Pizza",
    "name": "Edward Olson",
    "phoneNumbers":
        ["(935)503-3765x4154","(935)12345"],
    "address": "Unit 9398 Box 2056\nDPO AP 24022",
    "image": null,
    "pizzas": [
        {
            "pizzaName": "Salami",
            "additionalToppings": ["🥓", "🌶️"]
        },
        {
            "pizzaName": "Margherita",
            "additionalToppings": ["🍌", "🌶️", "🍍"]
        }
    ]
}

Check out the description of the fields

The following examples use a pizza order dataset with an order having:

id: 778
shop: “Luigis Pizza”
name: “Edward Olson”
phoneNumbers:["(935)503-3765x4154","(935)12345"]
address: “Unit 9398 Box 2056\nDPO AP 24022”
image: null
and two pizzas contained in the pizzas item:

[
    {
        "pizzaName": "Salami",
        "additionalToppings": ["🥓", "🌶️"]
    },
    {
        "pizzaName": "Margherita",
        "additionalToppings": ["🍌", "🌶️", "🍍"]
    }
]

If you want to reproduce the examples, check how to recreate the dataset

It can be recreated with the following script:

create table test(id serial, json_data jsonb);

insert into test(json_data) values (
'{
    "id": 778,
    "shop": "Luigis Pizza",
    "name": "Edward Olson",
    "phoneNumbers":
        ["(935)503-3765x4154","(935)12345"],
    "address": "Unit 9398 Box 2056\nDPO AP 24022",
    "image": null,
    "pizzas": [
        {
            "pizzaName": "Salami",
            "additionalToppings": ["🥓", "🌶️"]
        },
        {
            "pizzaName": "Margherita",
            "additionalToppings": ["🍌", "🌶️", "🍍"]
        }
    ]
}');

Create a GIN index on the JSONB column

To create a GIN index over the JSONB column:

create index json_data_gin on test using gin (json_data);

Retrieve all rows with a value

To retrieve all rows having the shop name Luigis Pizza, you can use the @> operator:

select * from test where json_data @> '{"shop":"Luigis Pizza"}';

Result

 id |                                                                                                                                                           json_data
----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  1 | {"id": 778, "name": "Edward Olson", "shop": "Luigis Pizza", "image": null, "pizzas": [{"pizzaName": "Salami", "additionalToppings": ["🥓", "🌶️"]}, {"pizzaName": "Margherita", "additionalToppings": ["🍌", "🌶️", "🍍"]}], "address": "Unit 9398 Box 2056\nDPO AP 24022", "phoneNumbers": ["(935)503-3765x4154", "(935)12345"]}
(1 row)

Checking with EXPLAIN:

explain analyse select * from test where json_data @> '{"shop":"Luigis Pizza"}';

Result shows the usage of the index:

                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=41.00..42.01 rows=1 width=394) (actual time=0.398..0.399 rows=1 loops=1)
   Recheck Cond: (json_data @> '{"shop": "Luigis Pizza"}'::jsonb)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on json_data_gin  (cost=0.00..41.00 rows=1 width=0) (actual time=0.389..0.390 rows=1 loops=1)
         Index Cond: (json_data @> '{"shop": "Luigis Pizza"}'::jsonb)
 Planning Time: 0.087 ms
 Execution Time: 0.470 ms
(7 rows)

But, if weyou filter for a specific subitem, like:

explain analyse select * from test where json_data -> 'pizzas' @> '{"pizzaName":"Margherita"}';

You get a sequence scan:

                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..470.13 rows=69 width=394) (actual time=2.064..2.064 rows=0 loops=1)
   Filter: ((json_data -> 'pizzas'::text) @> '{"pizzaName": "Margherita"}'::jsonb)
   Rows Removed by Filter: 6939
 Planning Time: 0.056 ms
 Execution Time: 2.085 ms
(5 rows)

If the above filter is common, you might want to create specific indexes for a single item (pizzas in this case).

Create an index on a specific JSONB item

To create an index for a specific item like the pizzas you can:

CREATE INDEX pizzas_gin ON test USING GIN ((json_data -> 'pizzas'));

Retrieve all rows with a specific filter

To filter for a specific item value, like pizzas having pizzaName equal to Margherita you can use the @> operator:

explain analyse select * from test where json_data -> 'pizzas' @> '{"pizzaName":"Margherita"}';

Results

 id |                                                                                                                                                           json_data
----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  1 | {"id": 778, "name": "Edward Olson", "shop": "Luigis Pizza", "image": null, "pizzas": [{"pizzaName": "Salami", "additionalToppings": ["🥓", "🌶️"]}, {"pizzaName": "Margherita", "additionalToppings": ["🍌", "🌶️", "🍍"]}], "address": "Unit 9398 Box 2056\nDPO AP 24022", "phoneNumbers": ["(935)503-3765x4154", "(935)12345"]}
(1 row)

Checking with EXPLAIN:

explain analyse select * from test where json_data @> '{"pizzas": [{"pizzaName":"Margherita"}]}';

Results, the index is used:

                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=43.00..44.01 rows=1 width=394) (actual time=0.493..0.494 rows=1 loops=1)
   Recheck Cond: (json_data @> '{"pizzas": [{"pizzaName": "Margherita"}]}'::jsonb)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on json_data_gin  (cost=0.00..43.00 rows=1 width=0) (actual time=0.483..0.483 rows=1 loops=1)
         Index Cond: (json_data @> '{"pizzas": [{"pizzaName": "Margherita"}]}'::jsonb)
 Planning Time: 0.186 ms
 Execution Time: 0.523 ms
(7 rows)