我的表有超过200,000,000个元组,并且经常必须运行以下查询并将结果显示在网页上,这需要很长时间:
select distinct(source), count(hitid) from tb_hit group by source;
我已经创建了一个索引,但是查询未使用它:
CREATE INDEX tb_hit_idx_5 on tb_hit USING btree (HitId ASC,Source ASC);
查询计划在这里:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=10702925.57..10702925.62 rows=6 width=13) (actual time=330574.690..330574.705 rows=7 loops=1)
-> Sort (cost=10702925.57..10702925.59 rows=6 width=13) (actual time=330574.689..330574.691 rows=7 loops=1)
Sort Key: source, (count(hitid))
Sort Method: quicksort Memory: 25kB
-> Finalize GroupAggregate (cost=10702919.26..10702925.50 rows=6 width=13) (actual time=330574.507..330574.647 rows=7 loops=1)
Group Key: source
-> Gather Merge (cost=10702919.26..10702925.20 rows=48 width=13) (actual time=330574.454..330588.679 rows=63 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Sort (cost=10701919.12..10701919.13 rows=6 width=13) (actual time=330561.376..330561.378 rows=7 loops=9)
Sort Key: source
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
Worker 2: Sort Method: quicksort Memory: 25kB
Worker 3: Sort Method: quicksort Memory: 25kB
Worker 4: Sort Method: quicksort Memory: 25kB
Worker 5: Sort Method: quicksort Memory: 25kB
Worker 6: Sort Method: quicksort Memory: 25kB
Worker 7: Sort Method: quicksort Memory: 25kB
-> Partial HashAggregate (cost=10701918.98..10701919.04 rows=6 width=13) (actual time=330561.260..330561.265 rows=7 loops=9)
Group Key: source
-> Parallel Seq Scan on tb_hit (cost=0.00..10523012.32 rows=35781332 width=13) (actual time=4.019..303398.636 rows=31814705 loops=9)
在set enable_seqscan = OFF;
之后,这是解释的结果:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=16625420.17..16625420.22 rows=6 width=13) (actual time=393693.931..393693.940 rows=7 loops=1)
-> Sort (cost=16625420.17..16625420.19 rows=6 width=13) (actual time=393693.929..393693.930 rows=7 loops=1)
Sort Key: source, (count(hitid))
Sort Method: quicksort Memory: 25kB
-> Finalize GroupAggregate (cost=16625413.86..16625420.10 rows=6 width=13) (actual time=393693.825..393693.902 rows=7 loops=1)
Group Key: source
-> Gather Merge (cost=16625413.86..16625419.80 rows=48 width=13) (actual time=393693.784..395576.863 rows=63 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Sort (cost=16624413.72..16624413.73 rows=6 width=13) (actual time=393680.090..393680.092 rows=7 loops=9)
Sort Key: source
Sort Method: quicksort Memory: 25kB
Worker 0: Sort Method: quicksort Memory: 25kB
Worker 1: Sort Method: quicksort Memory: 25kB
Worker 2: Sort Method: quicksort Memory: 25kB
Worker 3: Sort Method: quicksort Memory: 25kB
Worker 4: Sort Method: quicksort Memory: 25kB
Worker 5: Sort Method: quicksort Memory: 25kB
Worker 6: Sort Method: quicksort Memory: 25kB
Worker 7: Sort Method: quicksort Memory: 25kB
-> Partial HashAggregate (cost=16624413.58..16624413.64 rows=6 width=13) (actual time=393679.954..393679.959 rows=7 loops=9)
Group Key: source
-> Parallel Bitmap Heap Scan on tb_hit (cost=5922341.42..16445455.86 rows=35791544 width=13) (actual time=52043.284..367453.059 rows=31814705 loops=9)
Heap Blocks: exact=1216152
-> Bitmap Index Scan on tb_hit_idx_5 (cost=0.00..5850758.33 rows=286332352 width=0) (actual time=40833.844..40833.844 rows=286332344 loops=1)
Planning Time: 0.366 ms
Execution Time: 395577.824 ms
(27 rows)
答案 0 :(得分:0)
首先:DISTINCT
在这里是多余的,应该将其删除。 GROUP BY
已经保证了独特性。
DISTINCT
通常是一个性能问题,但这里的情况更简单:行的绝对数量决定了执行时间。
无法读取每一行,索引在这里无济于事。
您可以做的是创建一个包含所需结果的摘要表,并在修改基础表时通过触发器对其进行更新,以使计数始终准确。
然后,您可以查询该汇总表,这将非常快。您支付的价格是数据修改期间的触发器运行时。