我有以下表格:
users (id, network_id)
networks (id)
private_messages (id, sender_id, receiver_id, created_at)
我在users.network_id上有索引,私有消息中的所有3列都有,但是查询正在跳过索引并需要很长时间才能运行。在查询中导致索引被跳过的任何想法是什么?
EXPLAIN ANALYZE SELECT COUNT(*)
FROM "networks"
WHERE (
networks.created_at BETWEEN ((timestamp '2013-01-01')) AND (( (timestamp '2013-01-31') + interval '-1 second'))
AND (SELECT COUNT(*) FROM private_messages INNER JOIN users ON private_messages.receiver_id = users.id WHERE users.network_id = networks.id AND (private_messages.created_at BETWEEN ((timestamp '2013-03-01')) AND (( (timestamp '2013-03-31') + interval '-1 second'))) ) > 0)
结果:
Aggregate (cost=722675247.10..722675247.11 rows=1 width=0) (actual time=519916.108..519916.108 rows=1 loops=1)
-> Seq Scan on networks (cost=0.00..722675245.34 rows=703 width=0) (actual time=2576.205..519916.044 rows=78 loops=1)
Filter: ((created_at >= '2013-01-01 00:00:00'::timestamp without time zone) AND (created_at <= '2013-01-30 23:59:59'::timestamp without time zone) AND ((SubPlan 1) > 0))
SubPlan 1
-> Aggregate (cost=50671.34..50671.35 rows=1 width=0) (actual time=240.359..240.359 rows=1 loops=2163)
-> Hash Join (cost=10333.69..50671.27 rows=28 width=0) (actual time=233.997..240.340 rows=13 loops=2163)
Hash Cond: (private_messages.receiver_id = users.id)
-> Bitmap Heap Scan on private_messages (cost=10127.11..48675.15 rows=477136 width=4) (actual time=56.599..232.855 rows=473686 loops=1809)
Recheck Cond: ((created_at >= '2013-03-01 00:00:00'::timestamp without time zone) AND (created_at <= '2013-03-30 23:59:59'::timestamp without time zone))
-> Bitmap Index Scan on index_private_messages_on_created_at (cost=0.00..10007.83 rows=477136 width=0) (actual time=54.551..54.551 rows=473686 loops=1809)
Index Cond: ((created_at >= '2013-03-01 00:00:00'::timestamp without time zone) AND (created_at <= '2013-03-30 23:59:59'::timestamp without time zone))
-> Hash (cost=205.87..205.87 rows=57 width=4) (actual time=0.218..0.218 rows=2 loops=2163)
Buckets: 1024 Batches: 1 Memory Usage: 0kB
-> Index Scan using index_users_on_network_id on users (cost=0.00..205.87 rows=57 width=4) (actual time=0.154..0.215 rows=2 loops=2163)
Index Cond: (network_id = networks.id)
Total runtime: 519916.183 ms
谢谢。
答案 0 :(得分:2)
让我们尝试不同的东西。我只是建议这是一个“答案”,因为它的长度,你不能格式化评论。让我们模块化地将查询作为一系列需要相交的子集。让我们看看每个执行需要多长时间(请报告)。将时间戳替换为t1和t2。请注意每个查询如何构建在前一个查询的基础上,使前一个查询成为“内联视图”。
编辑:另外,请确认网络表中的列。
select PM.receiver_id from private_messages PM
where PM.create_at between (t1 and t2)
select U.id, U.network_id from users U
join
(
select PM.receiver_id from private_messages PM
where PM.create_at between (t1 and t2)
) as FOO
on U.id = FOO.receiver_id
select N.* from networks N
join
(
select U.id, U.network_id from users U
join
(
select PM.receiver_id from private_messages PM
where PM.create_at between (t1 and t2)
) as FOO
on U.id = FOO.receiver_id
) as BAR
on N.id = BAR.network_id
答案 1 :(得分:1)
首先,我认为您需要network.created_at
上的索引,即使现在超过10%的表与WHERE
匹配,它可能也不会被使用。
接下来,如果您尝试将尽可能多的逻辑放入一个查询中,而不是将某些查询拆分为子查询,我希望您的速度会更快。我相信该计划表明迭代匹配的network.id
的每个值;通常,一次性连接效果更好。
我认为以下代码在逻辑上是等效的。如果没有,请关闭。
SELECT COUNT(*)
FROM
(SELECT users.network_id FROM "networks"
JOIN users
ON users.network_id = networks.id
JOIN private_messages
ON private_messages.receiver_id = users.id
AND (private_messages.created_at
BETWEEN ((timestamp '2013-03-01'))
AND (( (timestamp '2013-03-31') + interval '-1 second')))
WHERE
networks.created_at
BETWEEN ((timestamp '2013-01-01'))
AND (( (timestamp '2013-01-31') + interval '-1 second'))
GROUP BY users.network_id)
AS main_subquery
;
我的经验是,如果您将networks.created_at
移动到ON
联接的users-networks
子句中,您将获得相同的查询计划。我不认为你的问题是时间戳;它是查询的结构。通过将GROUP BY
替换为子查询中的SELECT DISTINCT users.network_id
,您也可以获得更好(或更糟)的计划。