我有一个复杂的查询:
SELECT DISTINCT ON (delivery.id)
delivery.id, dl_processing.pid
FROM mailer.mailer_message_recipient_rel AS delivery
JOIN mailer.mailer_message AS message ON delivery.message_id = message.id
JOIN mailer.mailer_message_recipient_rel_log AS dl_processing ON dl_processing.rel_id = delivery.id AND dl_processing.status = 1000
-- LEFT JOIN mailer.mailer_recipient AS r ON delivery.email = r.email
JOIN mailer.mailer_mailing AS mailing ON message.mailing_id = mailing.id
WHERE
NOT EXISTS (SELECT dl_finished.id FROM mailer.mailer_message_recipient_rel_log AS dl_finished WHERE dl_finished.rel_id = delivery.id AND dl_finished.status <> 1000) AND
dl_processing.date <= NOW() - (36000 * INTERVAL '1 second') AND
NOT EXISTS (SELECT ml.id FROM mailer.mailer_message_log AS ml WHERE ml.message_id = message.id) AND
-- (r.times_bounced < 5 OR r.times_bounced IS NULL) AND
NOT EXISTS (SELECT ur.id FROM mailer.mailer_unsubscribed_recipient AS ur WHERE ur.email = delivery.email AND ur.list_id = mailing.list_id)
ORDER BY delivery.id, dl_processing.id DESC
LIMIT 1000;
运行速度非常慢,原因似乎是Postgres始终在其查询计划中一直避免使用合并连接,尽管我拥有了我需要的所有索引。看起来真的令人沮丧:
http://explain.depesz.com/s/tVY
http://i.stack.imgur.com/Myw4R.png
为什么会这样?如何解决此类问题?
UPD:使用@wildplasser的帮助我重新设计查询以修复性能(稍微改变其语义):
SELECT delivery.id, dl_processing.pid
FROM mailer.mailer_message_recipient_rel AS delivery
JOIN mailer.mailer_message AS message ON delivery.message_id = message.id
JOIN mailer.mailer_message_recipient_rel_log AS dl_processing ON dl_processing.rel_id = delivery.id AND dl_processing.status in (1000, 2, 5) AND dl_processing.date <= NOW() - (36000 * INTERVAL '1 second')
LEFT JOIN mailer.mailer_recipient AS r ON delivery.email = r.email
WHERE
(r.times_bounced < 5 OR r.times_bounced IS NULL) AND
NOT EXISTS (SELECT dl_other.id FROM mailer.mailer_message_recipient_rel_log AS dl_other WHERE dl_other.rel_id = delivery.id AND dl_other.id > dl_processing.id) AND
NOT EXISTS (SELECT ml.id FROM mailer.mailer_message_log AS ml WHERE ml.message_id = message.id) AND
NOT EXISTS (SELECT ur.id FROM mailer.mailer_unsubscribed_recipient AS ur JOIN mailer.mailer_mailing AS mailing ON message.mailing_id = mailing.id WHERE ur.email = delivery.email AND ur.list_id = mailing.list_id)
ORDER BY delivery.id
LIMIT 1000
它现在运行良好,但查询计划仍然运行这些可怕的嵌套循环连接&lt; _&lt;:
http://explain.depesz.com/s/MTo3
我仍然想知道为什么会这样。
答案 0 :(得分:5)
原因是Postgres实际上是在做正确的事情,而且我很擅长数学。假设表A有N行,表B有M行,它们通过一个列连接,它们都有一个B树索引。那么以下是真的:
ORDER
子句需要行的特定顺序时才需要这样做,因为我们将看到它并不是一个糟糕的交易。所以基本上尽管与我们都喜欢的合并排序相关联,但合并连接几乎总是糟透了。
我的第一个查询速度如此之慢的原因是因为它必须在应用限制之前执行排序,并且在许多其他方面也很糟糕。在应用了@worldplasser的建议之后,我设法减少了(仍然很昂贵的)嵌套循环的数量,并且允许在没有排序的情况下进行限制,从而确保Postgres很可能不需要将外部扫描运行到其completition,这是我从中获得大部分性能提升的地方。