我正在运行一个从单个表中检索数据的简单查询。如果我只查找表中的非JSON字段,则查询需要16毫秒。如果我包含引用JSONB数据中的字段的字段,那么它会增加62倍。如果我查找两个不同的JSONB字段,那么加倍。
--EXPLAIN (ANALYZE,buffers)
SELECT
segment as segment_no,
begin_time,
segment_data::json->'summary'->'begin_milage' as begin_milage,
segment_data::json->'summary'->'end_milage' as end_milage
FROM
segments_table
WHERE
vehicle=12 AND trip=3
ORDER BY
begin_time;
查询需要2.0秒,SELECT子句中包含两个JSON字段。如果省略一个需要1.0秒,如果省略两个JSON字段,则查询只需要16毫秒。
该表本身有大约700条记录。该查询返回83条记录。运行不同的查询我注意到,在查询2个JSON字段时,查询完成所需的时间越长(大约0.0066 * X 1.32 ms)。
我尝试为车辆和行程查找添加索引,但这并没有太大的区别(正如预期的那样)。它似乎是对数据的实际检索,并且在JSONB字段中查找数据需要时间。现在,如果在WHERE子句中需要JSON字段,那么看到这种降级会更容易理解,但事实并非如此。
一个简单的解决方案当然是将每个字段从JSON blob中拉出来,并在表中为此创建单独的字段。但在我走这条路之前还有什么能解决这个性能问题吗?
以下是ANALYZE的结果:
Sort (cost=13.25..13.27 rows=10 width=28) (actual time=1999.899..1999.901 rows=71 loops=1)
Sort Key: begin_time
Sort Method: quicksort Memory: 35kB
Buffers: shared hit=5663
-> Bitmap Heap Scan on segments_table (cost=4.38..13.08 rows=10 width=28) (actual time=1.332..1999.730 rows=71 loops=1)
Recheck Cond: ((vehicle = 644) AND (trip = 3))
Heap Blocks: exact=3
Buffers: shared hit=5663
-> Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq (cost=0.00..4.38 rows=10 width=0) (actual time=0.052..0.052 rows=71 loops=1)
Index Cond: ((vehicle = 644) AND (trip = 3))
Buffers: shared hit=2
Planning time: 0.368 ms
Execution time: 2000.000 ms
另一个有趣的观察是,多次运行相同的查询,我发现在后续相同的查询中我预计不会对缓存进行任何改进。
我对库存postgres服务器配置所做的唯一修改是将shared_buffers
从128MB增加到256MB并设置effective_cache_size = 1GB
。我还将max_connections
从100减少到了20。
以上结果在8核i7处理器的Win7下运行。在双核CPU上也在Ubuntu下进行了相同的测试,查询大致相同:2.2秒(在SELECT子句中包含两个JSONB字段时)。
更新:
SELECT clasuse中的单个JSON字段:
EXPLAIN (ANALYZE,buffers)
SELECT
segment as segment_no,
begin_time,
segment_data::json->'summary'->'end_mileage' as end_mileage
FROM
segments_table
WHERE
vehicle=644 AND trip=3
ORDER BY
begin_time;
结果:
Sort (cost=13.15..13.17 rows=10 width=28) (actual time=999.695..999.696 rows=71 loops=1)
Sort Key: begin_time
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=2834
-> Bitmap Heap Scan on segments_table (cost=4.38..12.98 rows=10 width=28) (actual time=0.781..999.554 rows=71 loops=1)
Recheck Cond: ((vehicle = 644) AND (trip = 3))
Heap Blocks: exact=3
Buffers: shared hit=2834
-> Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq (cost=0.00..4.38 rows=10 width=0) (actual time=0.052..0.052 rows=71 loops=1)
Index Cond: ((vehicle = 644) AND (trip = 3))
Buffers: shared hit=2
Planning time: 0.353 ms
Execution time: 999.777 ms
SELECT子句中没有JSON字段:
EXPLAIN (ANALYZE,buffers)
SELECT
segment as segment_no,
begin_time
FROM
segments_table
WHERE
vehicle=644 AND trip=3
ORDER BY
begin_time;
结果:
Sort (cost=13.05..13.07 rows=10 width=10) (actual time=0.194..0.205 rows=71 loops=1)
Sort Key: begin_time
Sort Method: quicksort Memory: 19kB
Buffers: shared hit=5
-> Bitmap Heap Scan on segments_table (cost=4.38..12.88 rows=10 width=10) (actual time=0.088..0.122 rows=71 loops=1)
Recheck Cond: ((vehicle = 644) AND (trip = 3))
Heap Blocks: exact=3
Buffers: shared hit=5
-> Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq (cost=0.00..4.38 rows=10 width=0) (actual time=0.048..0.048 rows=71 loops=1)
Index Cond: ((vehicle = 644) AND (trip = 3))
Buffers: shared hit=2
Planning time: 0.590 ms
Execution time: 0.280 ms
表格定义:
CREATE TABLE public.segments_table
(
segment_id integer NOT NULL DEFAULT nextval('segments_table_segment_id_seq'::regclass),
vehicle smallint NOT NULL,
trip smallint NOT NULL,
segment smallint NOT NULL,
begin_time timestamp without time zone NOT NULL,
segment_data jsonb,
CONSTRAINT segments_table_pkey PRIMARY KEY (segment_id),
CONSTRAINT segments_table_vehicle_64df5bc5_uniq UNIQUE (vehicle, trip, segment, begin_time)
)
WITH (
OIDS=FALSE
);
CREATE INDEX segments
ON public.segments_table
USING btree
(segment);
CREATE INDEX vehicles
ON public.segments_table
USING btree
(vehicle);
CREATE INDEX trips
ON public.segments_table
USING btree
(trip);
更新#2:
修复了@Mark_M指出的强制转换问题,将json
更改为jsonb`会将查询时间从2秒减少到300毫秒:
EXPLAIN (ANALYZE,buffers)
SELECT
segment as segment_no,
begin_time,
segment_data::jsonb->'summary'->'begin_mileage' as begin_mileage,
segment_data::jsonb->'summary'->'end_mileage' as end_mileage
FROM
segments_table
WHERE
vehicle=644 AND trip=3
ORDER BY
begin_time;
Sort (cost=13.15..13.17 rows=10 width=28) (actual time=296.339..296.342 rows=71 loops=1)
Sort Key: begin_time
Sort Method: quicksort Memory: 35kB
Buffers: shared hit=5663
-> Bitmap Heap Scan on segments_table (cost=4.38..12.98 rows=10 width=28) (actual time=0.275..296.229 rows=71 loops=1)
Recheck Cond: ((vehicle = 644) AND (trip = 3))
Heap Blocks: exact=3
Buffers: shared hit=5663
-> Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq (cost=0.00..4.38 rows=10 width=0) (actual time=0.045..0.045 rows=71 loops=1)
Index Cond: ((vehicle = 644) AND (trip = 3))
Buffers: shared hit=2
Planning time: 0.352 ms
Execution time: 296.473 ms
虽然改进了很多,但仍然是18x只是使用非JSON字段查找,但这要好得多。这对于使用JSONB字段是否是一个合理的性能影响?