我有一个按天划分的事务表。在大型环境中,一天的分区占用5Gb磁盘空间和大约5,000,000行。
以下24个时间范围的查询耗时超过5分钟,并且正在使用索引。
可以采取哪些措施来改善这种情况?
EXPLAIN ANALYZE
SELECT * FROM transactions
WHERE end_time > 1488970800000
AND end_time <= 1489057200000
AND synthetic_application_id = 1
ORDER BY insertion_time DESC
LIMIT 2000;
Limit (cost=257809.85..257814.85 rows=2000 width=485) (actual time=323745.024..323758.412 rows=2000 loops=1)
-> Sort (cost=257809.85..257818.83 rows=3592 width=485) (actual time=323745.008..323749.762 rows=2000 loops=1)
Sort Key: transactions.insertion_time
Sort Method: top-N heapsort Memory: 1628kB
-> Append (cost=0.00..257597.73 rows=3592 width=485) (actual time=879.457..323670.299 rows=4608 loops=1)
-> Seq Scan on transactions (cost=0.00..0.00 rows=1 width=2646) (actual time=0.004..0.004 rows=0 loops=1)
Filter: ((end_time > 1488970800000::bigint) AND (end_time <= 1489057200000::bigint) AND (application_id = 1))
-> Index Scan using transactions_p2017_end_time_applicati_idx13 on transactions_p2017_03_08 (cost=0.56..123142.03 rows=1698 width=470) (actual time=879.085..167714.455 rows=2112 loops=1)
Index Cond: ((end_time > 1488970800000::bigint) AND (end_time <= 1489057200000::bigint) AND (application_id = 1))
-> Index Scan using transactions_p2017_end_time_applicati_idx14 on transactions_p2017_03_09 (cost=0.56..134271.47 rows=1871 width=490) (actual time=395.117..155920.754 rows=2496 loops=1)
Index Cond: ((end_time > 1488970800000::bigint) AND (end_time <= 1489057200000::bigint) AND (application_id = 1))
Planning time: 198.866 ms
Execution time: 323765.693 ms
使用explain(analyze,buffers,timing)添加另一个查询, 可能有些数据已经加载到缓存中,因此数字更好。 (据我所知,没有办法清除Windows上的缓存)
"Limit (cost=227818.94..227823.94 rows=2000 width=474) (actual time=139343.951..139356.216 rows=2000 loops=1)"
" Buffers: shared hit=795 read=40933 written=246"
" -> Sort (cost=227818.94..227830.39 rows=4579 width=474) (actual time=139343.943..139348.214 rows=2000 loops=1)"
" Sort Key: transactions.insertion_time"
" Sort Method: top-N heapsort Memory: 1628kB"
" Buffers: shared hit=795 read=40933 written=246"
" -> Append (cost=0.00..227544.98 rows=4579 width=474) (actual time=733.521..139240.611 rows=4608 loops=1)"
" Buffers: shared hit=795 read=40933 written=246"
" -> Seq Scan on transactions (cost=0.00..0.00 rows=1 width=2646) (actual time=0.004..0.004 rows=0 loops=1)"
" Filter: ((end_time > 1488891600000::bigint) AND (end_time <= 1488978000000::bigint) AND (application_id = 1))"
"
" -> Index Scan using transactions_p2017_end_time_applicati_idx12 on transactions_p2017_03_07 (cost=0.56..101500.07 rows=2134 width=471) (actual time=733.351..120950.487 rows=1728 loops=1)"
" Index Cond: ((end_time > 1488891600000::bigint) AND (end_time <= 1488978000000::bigint) AND (application_id = 1))"
" Buffers: shared hit=263 read=19902 written=123"
" -> Index Scan using transactions_p2017_end_time_applicati_idx13 on transactions_p2017_03_08 (cost=0.56..125860.68 rows=2422 width=470) (actual time=114.143..18262.152 rows=2880 loops=1)"
" Index Cond: ((end_time > 1488891600000::bigint) AND (end_time <= 1488978000000::bigint) AND (application_id = 1))"
" Buffers: shared hit=498 read=21011 written=123"
"
"Planning time: 23.858 ms"
"Execution time: 139362.264 ms"
答案 0 :(得分:1)
在(synthetic_application_id, end_time)
上创建一个索引,看看是否可以改善索引扫描时间。
你的存储速度似乎很慢。
答案 1 :(得分:0)
这是一份清单:
vacuum analyze verbose
。请注意它所说的行数是不可删除的。另外,看看性能是否有所改善。explain analyze verbose
寻找的信息远远超过我们目前所能看到的信息。我还想添加一个解释Laurentz答案的说明以及为什么这可以解决您的问题。
如果你的索引在end_time,application_id上,那么它将检查application_ids范围内的每个end_time,你很可能会有很多未命中。另一方面,如果你可以先检查application_id,那么也许你可以避免检查很多end_time记录。所以这可以解决你的问题。 (如果你觉得这有用,你应该投票或接受他的答案。)
但请注意,您需要在每个分区上创建索引。
答案 2 :(得分:-1)
检查Postgres的work_mem设置。如果它无法将整个索引加载到内存中,您可能会遇到磁盘颠簸,这将使速度降低很多。