我们有一个带有多个表和左外部联接的PostgreSQL查询,并且运行速度非常慢。 它需要25到40秒才能完成,因此我们想对其进行更多优化,并将运行时间减少到1-2秒。
select a.campaignid, b.campaign_name , case when b.message_type_id = 1 then 'Promotional'
when b.message_type_id = 2 then 'Transactional'
else 'Other' end as Campaign_type, c.username , aggregator_type,
e.cli_manager_id as senderID,
b.schedule_time as campaign_schedule_date,
count(a.mobile) as campaign_submitted_count, count(case when a.status = 'DELIVRD' then mobile end) as Delivered,
count(a.mobile) as Total_count,
count(case when a.status = 'FAILED' then mobile end) as failure_count,
count(case when a.status = 'DND_check_failed' then mobile end) as DND_count,
sum(credits_used) as credits_used
from tbl_cdr_test a left outer join tbl_campaign b
on a.campaignid = b.tbl_campaign_id left outer join tbl_users_master c
on b.user_id =c.user_master_id
left outer join tbl_cli_manager e on b.user_id = e.user_id
left outer join tbl_user_channel f on b.user_id =f.user_id
left outer join tbl_user_configurations g on b.user_id = g.user_id
where date(insert_datetime) between '2020-05-23' and '2020-06-23'
and c.username = coalesce(null, c.username)
and g.msg_cat_id = coalesce(null, g.msg_cat_id)
and a.campaignid = coalesce(null, a.campaignid)
and e.cli_manager_id = coalesce(null, e.cli_manager_id)
group by a.campaignid, b.campaign_name , b.message_type_id,c.username , b.schedule_time,
aggregator_type, e.cli_manager_id;
我们也创建了适当的索引,但是仍然需要时间。
此外,执行计划中有一种“外部合并磁盘”排序方法,而要解决该问题,我将work_mem = 50MB设置为。仍然使用磁盘排序而不是内存。请建议
下面是执行计划:
GroupAggregate (cost=4872.01..4872.07 rows=1 width=543) (actual time=20564.239..27415.264 rows=8 loops=1)
Group Key: a.campaignid, b.campaign_name, b.message_type_id, c.username, b.schedule_time, f.aggregator_type, e.cli_manager_id
-> Sort (cost=4872.01..4872.01 rows=1 width=483) (actual time=19627.424..25020.702 rows=3206196 loops=1)
Sort Key: a.campaignid, b.campaign_name, b.message_type_id, c.username, b.schedule_time, f.aggregator_type, e.cli_manager_id
Sort Method: external merge Disk: 281456kB
-> Nested Loop (cost=22.03..4872.00 rows=1 width=483) (actual time=99.704..12086.244 rows=3206196 loops=1)
Join Filter: (b.user_id = g.user_id)
-> Nested Loop Left Join (cost=21.89..4871.79 rows=1 width=495) (actual time=99.688..4518.533 rows=3206196 loops=1)
-> Nested Loop (cost=21.75..4871.54 rows=1 width=77) (actual time=99.664..935.689 rows=356244 loops=1)
-> Nested Loop (cost=21.33..31.57 rows=1 width=65) (actual time=0.295..2.376 rows=588 loops=1)
Join Filter: (b.user_id = c.user_master_id)
-> Merge Join (cost=21.18..30.22 rows=6 width=46) (actual time=0.246..0.663 rows=588 loops=1)
Merge Cond: (e.user_id = b.user_id)
-> Index Scan using "idx_FK_7hc6agd_tbl_cli_ma_1592228110_32" on tbl_cli_manager e (cost=0.42..6281.84 rows=762 width=12) (actual time=0.014..0.035 rows=5 loops=1)
Filter: (cli_manager_id = COALESCE(cli_manager_id))
-> Sort (cost=20.76..21.13 rows=147 width=34) (actual time=0.225..0.333 rows=585 loops=1)
Sort Key: b.user_id
Sort Method: quicksort Memory: 36kB
-> Seq Scan on tbl_campaign b (cost=0.00..15.47 rows=147 width=34) (actual time=0.013..0.154 rows=147 loops=1)
-> Index Scan using ind_user_master_c_user on tbl_users_master c (cost=0.14..0.21 rows=1 width=19) (actual time=0.002..0.002 rows=1 loops=588)
Index Cond: (user_master_id = e.user_id)
Filter: ((username)::text = (COALESCE(username))::text)
-> Append (cost=0.42..4839.94 rows=3 width=20) (actual time=0.546..1.426 rows=606 loops=588)
-> Index Scan using testh11_campaignid_idx on testh11 a (cost=0.42..4253.99 rows=2 width=20) (actual time=0.543..0.543 rows=0 loops=588)
Index Cond: (campaignid = b.tbl_campaign_id)
Filter: ((campaignid = COALESCE(campaignid)) AND (date(insert_datetime) >= '2020-05-23'::date) AND (date(insert_datetime) <= '2020-06-23'::date))
Rows Removed by Filter: 656
-> Index Scan using testh21_campaignid_idx on testh21 a_1 (cost=0.42..585.94 rows=1 width=20) (actual time=0.002..0.796 rows=606 loops=588)
Index Cond: (campaignid = b.tbl_campaign_id)
Filter: ((campaignid = COALESCE(campaignid)) AND (date(insert_datetime) >= '2020-05-23'::date) AND (date(insert_datetime) <= '2020-06-23'::date))
-> Index Scan using idx_user_id_tbl_user_c_1592227657_19 on tbl_user_channel f (cost=0.14..0.24 rows=1 width=422) (actual time=0.002..0.004 rows=9 loops=356244)
Index Cond: (user_id = b.user_id)
-> Index Scan using "idx_FK_6958qvy_tbl_user_c_1592228774_151" on tbl_user_configurations g (cost=0.14..0.20 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=3206196)
Index Cond: (user_id = e.user_id)
Filter: (msg_cat_id = COALESCE(msg_cat_id))
Planning Time: 6.561 ms
Execution Time: 27477.860 ms
答案 0 :(得分:0)
在testh21
上进行索引扫描时,结果行总被低估了。结果是PostgreSQL选择了嵌套循环联接,这是您花费时间的地方。
尝试以下操作:
新统计信息:
ANALYZE testh21;
如果这样可以改善估算值,请确保自动分析会更频繁地处理表格。
防止因相关性造成的错误估算:
CREATE STATISTICS testh21_stat (dependencies)
ON campaignid, insert_datetime FROM testh21;
ANALYZE testh21;
也许各列之间存在相关性,从而改善了估计。
更详细的统计信息:尝试在表格的default_statistics_target
之前提高ANALYZE
。
如果您无法改善估算值,请锤击并在查询期间设置enable_nestloop = off
。