对于我们开发的航班零售引擎,我们将订单作为JSON文档存储在PostgreSQL数据库中。
订单表定义为:
CREATE TABLE IF NOT EXISTS orders (
id SERIAL PRIMARY KEY,
order_data JSONB NOT NULL
);
典型订单文档的简化版本如下所示:
{
"orderID":"ORD000001",
"invalid":false,
"creationDate":"2017-11-19T15:49:53.897",
"orderItems":[
{
"orderItemID":"ITEM000001",
"flight":{
"id":"FL000001",
"segments":[
{
"origin":"FRA",
"destination":"LHR",
"departure":"2018-05-12T14:00:00",
"arrival":"2018-05-12T14:40:00",
"marketingCarrier":"LH",
"marketingFlightNumber":"LH908"
}
]
},
"passenger":{
"lastName":"Test",
"firstName":"Thomas",
"passengerTypeCode":"ADT"
}
},
{
"orderItemID":"ITEM000002",
"flight":{
"id":"FL000002",
"segments":[
{
"origin":"LHR",
"destination":"FRA",
"departure":"2018-05-17T11:30:00",
"arrival":"2018-05-17T14:05:00",
"marketingCarrier":"LH",
"marketingFlightNumber":"LH905"
}
]
},
"passenger":{
"lastName":"Test",
"firstName":"Thomas",
"passengerTypeCode":"ADT"
}
}
]
}
此表的条目数量可能会增长得更多(超过1亿)。
在“orderID”上创建GIN索引工作正常,正如预期的那样,可以显着加快对具有特定ID的订单的查询速度。
但我们还要求更复杂的请求快速执行,例如搜索特定航段的订单。
感谢this线程,我能够编写像
这样的请求SELECT *
FROM orders,
jsonb_array_elements(order_data->'orderItems') orderItems,
jsonb_array_elements(orderItems->'flight'->'segments') segments
WHERE order_data->>'invalid'='false'
AND segments->>'origin'='LHR'
AND ( (segments->>'marketingCarrier'='LH' AND segments->>'marketingFlightNumber'='LH905') OR (segments->>'operatingCarrier'='LH' AND segments->>'operatingFlightNumber'='LH905') )
AND segments->>'departure' BETWEEN '2018-05-17T10:00:00' AND '2018-05-17T18:00:00'
这种方法很好,但对我们的要求来说太慢了。
加快此类查询的最佳方法是什么?
创建像
这样的物化视图CREATE MATERIALIZED VIEW order_segments AS
SELECT id, order_data->>'orderID' AS orderID, segments->>'origin' AS origin, segments->>'marketingCarrier' AS marketingCarrier, segments->>'marketingFlightNumber' AS marketingFlightNumber, segments->>'operatingCarrier' AS operatingCarrier, segments->>'operatingFlightNumber' AS operatingFlightNumber, segments->>'departure' AS departure
FROM orders,
jsonb_array_elements(order_data -> 'orderItems') orderItems,
jsonb_array_elements(orderItems -> 'flight'->'segments') segments
WHERE order_data->>'invalid'='false';
有效,但缺点是无法自动更新。
那么,我如何在订单表上定义索引以实现快速执行时间?或者是否有完全不同的解决方案?
答案 0 :(得分:0)
终于找到了我自己的问题的答案:
设置索引
CREATE INDEX ix_order_items ON orders USING gin (((order_data->'orderItems')) jsonb_path_ops)
并使用请求
SELECT DISTINCT id, order_data
FROM orders,
jsonb_array_elements(order_data -> 'orderItems') orderItems,
jsonb_array_elements(orderItems -> 'flight'->'segments') segments
WHERE id IN
( SELECT id
FROM orders
WHERE order_data->'orderItems'@>'[{"flight": {"segments": [{"origin":"LHR"}]}}]'
AND (
order_data->'orderItems'@>'[{"flight": {"segments": [{"marketingCarrier":"LH","marketingFlightNumber":"LH905"}]}}]'
OR
order_data->'orderItems'@>'[{"flight": {"segments": [{"operatingCarrier":"LH","operatingFlightNumber":"LH905"}]}}]'
)
)
AND order_data@>'{"invalid": false}'
AND segments->>'departure' BETWEEN '2018-05-17T10:00:00' AND '2018-05-17T18:00:00'
将请求从几秒加速到几毫秒。