我有一个PostgreSql 9.6数据库,用于记录应用程序的调试日志。它包含1.3亿条记录。主要字段是使用GIN索引的jsonb类型。
如果我执行如下查询,它将迅速执行:
select id, logentry from inettklog where
logentry @> '{"instance":"1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb;
这是解释分析:
Bitmap Heap Scan on inettklog (cost=2938.03..491856.81 rows=137552 width=300) (actual time=10.610..12.644 rows=128 loops=1)
Recheck Cond: (logentry @> '{"instance": "1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb)
Heap Blocks: exact=128
-> Bitmap Index Scan on inettklog_ix_logentry (cost=0.00..2903.64 rows=137552 width=0) (actual time=10.564..10.564 rows=128 loops=1)
Index Cond: (logentry @> '{"instance": "1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb)
Planning time: 68.522 ms
Execution time: 12.720 ms
(7 rows)
但是,如果我只是添加一个限制,它突然变得非常缓慢:
select id, logentry from inettklog where
logentry @> '{"instance":"1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb
limit 20;
现在需要20秒钟以上!
Limit (cost=0.00..1247.91 rows=20 width=300) (actual time=0.142..37791.319 rows=20 loops=1)
-> Seq Scan on inettklog (cost=0.00..8582696.05 rows=137553 width=300) (actual time=0.141..37791.308 rows=20 loops=1)
Filter: (logentry @> '{"instance": "1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb)
Rows Removed by Filter: 30825572
Planning time: 0.174 ms
Execution time: 37791.351 ms
(6 rows)
即使设置了enable_seqscan = off,以下是包含ORDER BY时的结果:
没有限制:
set enable_seqscan = off;
set enable_indexscan = on;
select id, date, logentry from inettklog where
logentry @> '{"instance":"1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb
order by date;
说明分析:
Sort (cost=523244.24..523588.24 rows=137600 width=308) (actual time=48.196..48.219 rows=128 loops=1)
Sort Key: date
Sort Method: quicksort Memory: 283kB
-> Bitmap Heap Scan on inettklog (cost=2658.40..491746.00 rows=137600 width=308) (actual time=31.773..47.865 rows=128 loops=1)
Recheck Cond: (logentry @> '{"instance": "1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb)
Heap Blocks: exact=128
-> Bitmap Index Scan on inettklog_ix_logentry (cost=0.00..2624.00 rows=137600 width=0) (actual time=31.550..31.550 rows=128 loops=1)
Index Cond: (logentry @> '{"instance": "1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb)
Planning time: 0.181 ms
Execution time: 48.254 ms
(10 rows)
现在,当我们添加限制时:
set enable_seqscan = off;
set enable_indexscan = on;
select id, date, logentry from inettklog where
logentry @> '{"instance":"1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb
order by date
limit 20;
现在需要90秒!!!
Limit (cost=0.57..4088.36 rows=20 width=308) (actual time=32017.438..98544.017 rows=20 loops=1)
-> Index Scan using inettklog_ix_logdate on inettklog (cost=0.57..28123416.21 rows=137597 width=308) (actual time=32017.437..98544.008 rows=20 loops=1)
Filter: (logentry @> '{"instance": "1.3.46.670589.11.0.0.11.4.2.0.8743.5.5396.2006120114440692624"}'::jsonb)
Rows Removed by Filter: 27829853
Planning time: 0.249 ms
Execution time: 98544.043 ms
(6 rows)
这一切都很混乱!我希望能够提供一个实用程序来快速查询该数据库,但这全是违反直觉的。
任何人都可以解释发生了什么吗? 谁能解释规则?
答案 0 :(得分:0)
估计相差甚远。尝试运行ANALYZE
,可能增加default_statistics_target
。
由于PostgreSQL认为结果太多,因此认为最好执行顺序扫描并在获得足够结果后立即停止。
答案 1 :(得分:0)
在没有索引索引的情况下使用限制会降低速度,因为它将扫描整个表然后为您提供结果。因此,与其在登录时创建索引,然后运行带有限制的查询,不如这样做。它将为您带来更快的结果。
您可以查看以下答案以供参考:PostgreSQL query very slow with limit 1