ftisiot ideas about food and life
who talks suggestions soft json-pg

How to index and query a JSON object in PostgreSQL®?

Published Jan 6, 2023 by in Postgresql, Json, Jsonb, Index, Query at https://ftisiot.net/postgresqljson/how-to-index-and-query-json-postgresql/

PostgreSQL® offers two types of data types to handle JSON data, JSON and JSONB. This doc showcases how to index a JSONB column with a GIN index.

Some of the examples are based on this blog.

NOTE: To review the differences between JSON and JSONB check out the related article.

NOTE: more info is available in the PostgreSQL JSON functions documentation page

The dataset

The dataset is the following:

{
    "id": 778,
    "shop": "Luigis Pizza",
    "name": "Edward Olson",
    "phoneNumbers":
        ["(935)503-3765x4154","(935)12345"],
    "address": "Unit 9398 Box 2056\nDPO AP 24022",
    "image": null,
    "pizzas": [
        {
            "pizzaName": "Salami",
            "additionalToppings": ["🥓", "🌶️"]
        },
        {
            "pizzaName": "Margherita",
            "additionalToppings": ["🍌", "🌶️", "🍍"]
        }
    ]
}
Check out the description of the fields The following examples use a pizza order dataset with an order having:
  • id: 778
  • shop: “Luigis Pizza”
  • name: “Edward Olson”
  • phoneNumbers:["(935)503-3765x4154","(935)12345"]
  • address: “Unit 9398 Box 2056\nDPO AP 24022”
  • image: null
  • and two pizzas contained in the pizzas item:
[
    {
        "pizzaName": "Salami",
        "additionalToppings": ["🥓", "🌶️"]
    },
    {
        "pizzaName": "Margherita",
        "additionalToppings": ["🍌", "🌶️", "🍍"]
    }
]
If you want to reproduce the examples, check how to recreate the dataset

It can be recreated with the following script:

create table test(id serial, json_data jsonb);

insert into test(json_data) values (
'{
    "id": 778,
    "shop": "Luigis Pizza",
    "name": "Edward Olson",
    "phoneNumbers":
        ["(935)503-3765x4154","(935)12345"],
    "address": "Unit 9398 Box 2056\nDPO AP 24022",
    "image": null,
    "pizzas": [
        {
            "pizzaName": "Salami",
            "additionalToppings": ["🥓", "🌶️"]
        },
        {
            "pizzaName": "Margherita",
            "additionalToppings": ["🍌", "🌶️", "🍍"]
        }
    ]
}');

Create a GIN index on the JSONB column

To create a GIN index over the JSONB column:

create index json_data_gin on test using gin (json_data);

Retrieve all rows with a value

To retrieve all rows having the shop name Luigis Pizza, you can use the @> operator:

select * from test where json_data @> '{"shop":"Luigis Pizza"}';

Result

 id |                                                                                                                                                           json_data
----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  1 | {"id": 778, "name": "Edward Olson", "shop": "Luigis Pizza", "image": null, "pizzas": [{"pizzaName": "Salami", "additionalToppings": ["🥓", "🌶️"]}, {"pizzaName": "Margherita", "additionalToppings": ["🍌", "🌶️", "🍍"]}], "address": "Unit 9398 Box 2056\nDPO AP 24022", "phoneNumbers": ["(935)503-3765x4154", "(935)12345"]}
(1 row)

Checking with EXPLAIN:

explain analyse select * from test where json_data @> '{"shop":"Luigis Pizza"}';

Result shows the usage of the index:

                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=41.00..42.01 rows=1 width=394) (actual time=0.398..0.399 rows=1 loops=1)
   Recheck Cond: (json_data @> '{"shop": "Luigis Pizza"}'::jsonb)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on json_data_gin  (cost=0.00..41.00 rows=1 width=0) (actual time=0.389..0.390 rows=1 loops=1)
         Index Cond: (json_data @> '{"shop": "Luigis Pizza"}'::jsonb)
 Planning Time: 0.087 ms
 Execution Time: 0.470 ms
(7 rows)

But, if weyou filter for a specific subitem, like:

explain analyse select * from test where json_data -> 'pizzas' @> '{"pizzaName":"Margherita"}';

You get a sequence scan:

                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..470.13 rows=69 width=394) (actual time=2.064..2.064 rows=0 loops=1)
   Filter: ((json_data -> 'pizzas'::text) @> '{"pizzaName": "Margherita"}'::jsonb)
   Rows Removed by Filter: 6939
 Planning Time: 0.056 ms
 Execution Time: 2.085 ms
(5 rows)

If the above filter is common, you might want to create specific indexes for a single item (pizzas in this case).

Create an index on a specific JSONB item

To create an index for a specific item like the pizzas you can:

CREATE INDEX pizzas_gin ON test USING GIN ((json_data -> 'pizzas'));

Retrieve all rows with a specific filter

To filter for a specific item value, like pizzas having pizzaName equal to Margherita you can use the @> operator:

explain analyse select * from test where json_data -> 'pizzas' @> '{"pizzaName":"Margherita"}';

Results

 id |                                                                                                                                                           json_data
----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  1 | {"id": 778, "name": "Edward Olson", "shop": "Luigis Pizza", "image": null, "pizzas": [{"pizzaName": "Salami", "additionalToppings": ["🥓", "🌶️"]}, {"pizzaName": "Margherita", "additionalToppings": ["🍌", "🌶️", "🍍"]}], "address": "Unit 9398 Box 2056\nDPO AP 24022", "phoneNumbers": ["(935)503-3765x4154", "(935)12345"]}
(1 row)

Checking with EXPLAIN:

explain analyse select * from test where json_data @> '{"pizzas": [{"pizzaName":"Margherita"}]}';

Results, the index is used:

                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=43.00..44.01 rows=1 width=394) (actual time=0.493..0.494 rows=1 loops=1)
   Recheck Cond: (json_data @> '{"pizzas": [{"pizzaName": "Margherita"}]}'::jsonb)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on json_data_gin  (cost=0.00..43.00 rows=1 width=0) (actual time=0.483..0.483 rows=1 loops=1)
         Index Cond: (json_data @> '{"pizzas": [{"pizzaName": "Margherita"}]}'::jsonb)
 Planning Time: 0.186 ms
 Execution Time: 0.523 ms
(7 rows)

Francesco Tisiot

Mastodon Francesco comes from Verona, Italy and works as a Senior Developer Advocate at Aiven. With his many years of experience as a data analyst, he has stories to tell and advice for data-wranglers everywhere. Francesco loves sharing knowledge with others as a speaker and writer, and is on a mission to defend the world from bad Italian food!

Story logo

© 2023