我有两张桌子:
用户 id |名字..
拉请求 id | user_id | created_at | ...
我需要获取所有用户加入他们的特定年份的拉取请求数。所以我写了一个像这样的查询:
SELECT users.*, COUNT(pull_requests.id) as pull_requests_count
FROM "users" INNER JOIN
"pull_requests"
ON "pull_requests"."user_id" = "users"."id"
WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013)
GROUP BY users.id
我最初有索引,
pull_requests.user_id(btree)。 在做解释时我得到了这个:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=18.93..18.96 rows=3 width=2775)
-> Hash Join (cost=14.13..18.92 rows=3 width=2775)
Hash Cond: (users.id = pull_requests.user_id)
-> Seq Scan on users (cost=0.00..4.08 rows=108 width=2771)
-> Hash (cost=14.09..14.09 rows=3 width=8)
-> Bitmap Heap Scan on pull_requests (cost=4.28..14.09 rows=3 width=8)
Recheck Cond: (date_part('year'::text, created_at) = 2013::double precision)
-> Bitmap Index Scan on pull_req_extract_year_created_at_ix (cost=0.00..4.28 rows=3 width=0)
Index Cond: (date_part('year'::text, created_at) = 2013::double precision)
然后我添加了这样的索引:
CREATE INDEX pull_req_extract_year_created_at_ix ON pull_requests (EXTRACT(year FROM created_at));
现在我的解释是:
QUERY PLAN
--------------------------------------------------------------------------------------------
HashAggregate (cost=63.99..64.02 rows=3 width=2775)
-> Hash Join (cost=59.19..63.98 rows=3 width=2775)
Hash Cond: (users.id = pull_requests.user_id)
-> Seq Scan on users (cost=0.00..4.08 rows=108 width=2771)
-> Hash (cost=59.16..59.16 rows=3 width=8)
-> Seq Scan on pull_requests (cost=0.00..59.16 rows=3 width=8)
Filter: (date_part('year'::text, created_at) = 2013::double precision)
对于100行左右,我仍然得到6.6毫秒。我该如何进一步优化这个?
谢谢!
答案 0 :(得分:1)
尝试将两个索引合并为一个:
CREATE INDEX pr_ix ON pull_requests(EXTRACT(year FROM created_at), user_id);
然后将查询命名为:
SELECT users.*, pull_requests_count
FROM "users" INNER JOIN
(select user_id, count(*) as pull_requests_count
from "pull_requests"
WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013)
group by user_id
) pr
ON pr."user_id" = "users"."id";
索引完全覆盖了子查询,因此不需要原始表,只需要索引扫描。然后可以将其连接回用户。