为什么在Postgres中查找JSONB字段会将查询时间增加62倍?

时间:2017-09-20 18:25:34

标签: json postgresql database-design database-performance database-caching

我正在运行一个从单个表中检索数据的简单查询。如果我只查找表中的非JSON字段,则查询需要16毫秒。如果我包含引用JSONB数据中的字段的字段,那么它会增加62倍。如果我查找两个不同的JSONB字段,那么加倍。

--EXPLAIN (ANALYZE,buffers)
SELECT 
  segment as segment_no,
  begin_time,
  segment_data::json->'summary'->'begin_milage' as begin_milage,
  segment_data::json->'summary'->'end_milage' as end_milage
FROM
  segments_table
WHERE
  vehicle=12 AND trip=3
ORDER BY
  begin_time;

查询需要2.0秒,SELECT子句中包含两个JSON字段。如果省略一个需要1.0秒,如果省略两个JSON字段,则查询只需要16毫秒。

该表本身有大约700条记录。该查询返回83条记录。运行不同的查询我注意到,在查询2个JSON字段时,查询完成所需的时间越长(大约0.0066 * X 1.32 ms)。

我尝试为车辆和行程查找添加索引,但这并没有太大的区别(正如预期的那样)。它似乎是对数据的实际检索,并且在JSONB字段中查找数据需要时间。现在,如果在WHERE子句中需要JSON字段,那么看到这种降级会更容易理解,但事实并非如此。

一个简单的解决方案当然是将每个字段从JSON blob中拉出来,并在表中为此创建单独的字段。但在我走这条路之前还有什么能解决这个性能问题吗?

以下是ANALYZE的结果:

Sort  (cost=13.25..13.27 rows=10 width=28) (actual time=1999.899..1999.901 rows=71 loops=1)
  Sort Key: begin_time
  Sort Method: quicksort  Memory: 35kB
  Buffers: shared hit=5663
  ->  Bitmap Heap Scan on segments_table  (cost=4.38..13.08 rows=10 width=28) (actual time=1.332..1999.730 rows=71 loops=1)
        Recheck Cond: ((vehicle = 644) AND (trip = 3))
        Heap Blocks: exact=3
        Buffers: shared hit=5663
        ->  Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq  (cost=0.00..4.38 rows=10 width=0) (actual time=0.052..0.052 rows=71 loops=1)
              Index Cond: ((vehicle = 644) AND (trip = 3))
              Buffers: shared hit=2
Planning time: 0.368 ms
Execution time: 2000.000 ms

另一个有趣的观察是,多次运行相同的查询,我发现在后续相同的查询中我预计不会对缓存进行任何改进。

我对库存postgres服务器配置所做的唯一修改是将shared_buffers从128MB增加到256MB并设置effective_cache_size = 1GB。我还将max_connections从100减少到了20。

以上结果在8核i7处理器的Win7下运行。在双核CPU上也在Ubuntu下进行了相同的测试,查询大致相同:2.2秒(在SELECT子句中包含两个JSONB字段时)。

更新

SELECT clasuse中的单个JSON字段:

EXPLAIN (ANALYZE,buffers)
SELECT 
  segment as segment_no,
  begin_time, 
  segment_data::json->'summary'->'end_mileage' as end_mileage
FROM
  segments_table
WHERE
  vehicle=644 AND trip=3
ORDER BY
  begin_time;

结果:

Sort  (cost=13.15..13.17 rows=10 width=28) (actual time=999.695..999.696 rows=71 loops=1)
  Sort Key: begin_time
  Sort Method: quicksort  Memory: 26kB
  Buffers: shared hit=2834
  ->  Bitmap Heap Scan on segments_table  (cost=4.38..12.98 rows=10 width=28) (actual time=0.781..999.554 rows=71 loops=1)
        Recheck Cond: ((vehicle = 644) AND (trip = 3))
        Heap Blocks: exact=3
        Buffers: shared hit=2834
        ->  Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq  (cost=0.00..4.38 rows=10 width=0) (actual time=0.052..0.052 rows=71 loops=1)
              Index Cond: ((vehicle = 644) AND (trip = 3))
              Buffers: shared hit=2
Planning time: 0.353 ms
Execution time: 999.777 ms

SELECT子句中没有JSON字段:

EXPLAIN (ANALYZE,buffers)
SELECT 
  segment as segment_no,
  begin_time
FROM
  segments_table
WHERE
  vehicle=644 AND trip=3
ORDER BY
  begin_time;

结果:

Sort  (cost=13.05..13.07 rows=10 width=10) (actual time=0.194..0.205 rows=71 loops=1)
  Sort Key: begin_time
  Sort Method: quicksort  Memory: 19kB
  Buffers: shared hit=5
  ->  Bitmap Heap Scan on segments_table  (cost=4.38..12.88 rows=10 width=10) (actual time=0.088..0.122 rows=71 loops=1)
        Recheck Cond: ((vehicle = 644) AND (trip = 3))
        Heap Blocks: exact=3
        Buffers: shared hit=5
        ->  Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq  (cost=0.00..4.38 rows=10 width=0) (actual time=0.048..0.048 rows=71 loops=1)
              Index Cond: ((vehicle = 644) AND (trip = 3))
              Buffers: shared hit=2
Planning time: 0.590 ms
Execution time: 0.280 ms

表格定义:

CREATE TABLE public.segments_table
(
  segment_id integer NOT NULL DEFAULT nextval('segments_table_segment_id_seq'::regclass),
  vehicle smallint NOT NULL,
  trip smallint NOT NULL,
  segment smallint NOT NULL,
  begin_time timestamp without time zone NOT NULL,
  segment_data jsonb,
  CONSTRAINT segments_table_pkey PRIMARY KEY (segment_id),
  CONSTRAINT segments_table_vehicle_64df5bc5_uniq UNIQUE (vehicle, trip, segment, begin_time)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX segments
  ON public.segments_table
  USING btree
  (segment);

CREATE INDEX vehicles
  ON public.segments_table
  USING btree
  (vehicle);

CREATE INDEX trips
  ON public.segments_table
  USING btree
  (trip);

更新#2:

修复了@Mark_M指出的强制转换问题,将json更改为jsonb`会将查询时间从2秒减少到300毫秒:

EXPLAIN (ANALYZE,buffers)
SELECT 
  segment as segment_no,
  begin_time,
  segment_data::jsonb->'summary'->'begin_mileage' as begin_mileage,
  segment_data::jsonb->'summary'->'end_mileage' as end_mileage
FROM
  segments_table
WHERE
  vehicle=644 AND trip=3
ORDER BY
  begin_time;
  Sort  (cost=13.15..13.17 rows=10 width=28) (actual time=296.339..296.342 rows=71 loops=1)
  Sort Key: begin_time
  Sort Method: quicksort  Memory: 35kB
  Buffers: shared hit=5663
  ->  Bitmap Heap Scan on segments_table  (cost=4.38..12.98 rows=10 width=28) (actual time=0.275..296.229 rows=71 loops=1)
        Recheck Cond: ((vehicle = 644) AND (trip = 3))
        Heap Blocks: exact=3
        Buffers: shared hit=5663
        ->  Bitmap Index Scan on segments_table_vehicle_64df5bc5_uniq  (cost=0.00..4.38 rows=10 width=0) (actual time=0.045..0.045 rows=71 loops=1)
              Index Cond: ((vehicle = 644) AND (trip = 3))
              Buffers: shared hit=2
Planning time: 0.352 ms
Execution time: 296.473 ms

虽然改进了很多,但仍然是18x只是使用非JSON字段查找,但这要好得多。这对于使用JSONB字段是否是一个合理的性能影响?

0 个答案:

没有答案