如何在PostgreSQL中加速嵌套的JSON查询?

时间:2017-11-20 12:30:59

标签: json postgresql performance jsonb

对于我们开发的航班零售引擎,我们将订单作为JSON文档存储在PostgreSQL数据库中。

订单表定义为:

CREATE TABLE IF NOT EXISTS orders (
  id          SERIAL PRIMARY KEY,
  order_data  JSONB NOT NULL
);

典型订单文档的简化版本如下所示:

{  
   "orderID":"ORD000001",
   "invalid":false,
   "creationDate":"2017-11-19T15:49:53.897",
   "orderItems":[  
      {  
         "orderItemID":"ITEM000001",
         "flight":{  
            "id":"FL000001",
            "segments":[  
               {  
                  "origin":"FRA",
                  "destination":"LHR",
                  "departure":"2018-05-12T14:00:00",
                  "arrival":"2018-05-12T14:40:00",
                  "marketingCarrier":"LH",
                  "marketingFlightNumber":"LH908"
               }
            ]
         },
         "passenger":{  
            "lastName":"Test",
            "firstName":"Thomas",
            "passengerTypeCode":"ADT"
         }
      },
      {  
         "orderItemID":"ITEM000002",
         "flight":{  
            "id":"FL000002",
            "segments":[  
               {  
                  "origin":"LHR",
                  "destination":"FRA",
                  "departure":"2018-05-17T11:30:00",
                  "arrival":"2018-05-17T14:05:00",
                  "marketingCarrier":"LH",
                  "marketingFlightNumber":"LH905"
               }
            ]
         },
         "passenger":{  
            "lastName":"Test",
            "firstName":"Thomas",
            "passengerTypeCode":"ADT"
         }
      }
   ]
}

此表的条目数量可能会增长得更多(超过1亿)。

在“orderID”上创建GIN索引工作正常,正如预期的那样,可以显着加快对具有特定ID的订单的查询速度。

但我们还要求更复杂的请求快速执行,例如搜索特定航段的订单。

感谢this线程,我能够编写像

这样的请求
SELECT *
FROM orders,
  jsonb_array_elements(order_data->'orderItems') orderItems,
  jsonb_array_elements(orderItems->'flight'->'segments') segments
WHERE order_data->>'invalid'='false'
  AND segments->>'origin'='LHR'
  AND ( (segments->>'marketingCarrier'='LH' AND segments->>'marketingFlightNumber'='LH905') OR (segments->>'operatingCarrier'='LH' AND segments->>'operatingFlightNumber'='LH905') )
  AND segments->>'departure' BETWEEN '2018-05-17T10:00:00' AND '2018-05-17T18:00:00'

这种方法很好,但对我们的要求来说太慢了。

加快此类查询的最佳方法是什么?

创建像

这样的物化视图
CREATE MATERIALIZED VIEW order_segments AS
SELECT id, order_data->>'orderID' AS orderID, segments->>'origin' AS origin, segments->>'marketingCarrier' AS marketingCarrier, segments->>'marketingFlightNumber' AS marketingFlightNumber, segments->>'operatingCarrier' AS operatingCarrier, segments->>'operatingFlightNumber' AS operatingFlightNumber, segments->>'departure' AS departure
FROM orders,
  jsonb_array_elements(order_data -> 'orderItems') orderItems,
  jsonb_array_elements(orderItems -> 'flight'->'segments') segments
WHERE order_data->>'invalid'='false';

有效,但缺点是无法自动更新。

那么,我如何在订单表上定义索引以实现快速执行时间?或者是否有完全不同的解决方案?

1 个答案:

答案 0 :(得分:0)

终于找到了我自己的问题的答案:

设置索引

CREATE INDEX ix_order_items ON orders USING gin (((order_data->'orderItems')) jsonb_path_ops)

并使用请求

SELECT DISTINCT id, order_data
FROM orders,
  jsonb_array_elements(order_data -> 'orderItems') orderItems,
  jsonb_array_elements(orderItems -> 'flight'->'segments') segments
WHERE id IN
( SELECT id
  FROM orders
  WHERE order_data->'orderItems'@>'[{"flight": {"segments": [{"origin":"LHR"}]}}]'
    AND (
      order_data->'orderItems'@>'[{"flight": {"segments": [{"marketingCarrier":"LH","marketingFlightNumber":"LH905"}]}}]'
      OR
      order_data->'orderItems'@>'[{"flight": {"segments": [{"operatingCarrier":"LH","operatingFlightNumber":"LH905"}]}}]'
    )
)
AND order_data@>'{"invalid": false}'
AND segments->>'departure' BETWEEN '2018-05-17T10:00:00' AND '2018-05-17T18:00:00'

将请求从几秒加速到几毫秒。