我正在尝试查找自上次创建用户以来的三个月内创建的用户数。全部按州分组。
这是一个有效的查询:
select count(u.id) as numberOfUsers,
s.state
from users u
join states s on u.state_id = s.id
where u.creationdate > (
select max(u2.creationdate)
from users u2
where u2.state_id = s.id
) - interval '3 months'
group by s.state
然而,它需要100秒。有人能给我一个更高效的吗?
我希望这有效:
select count(u.id) as numberOfUsers,
s.state, max(u2.creationdate) as lastCreated
from users u
join states s on u.state_id = s.id
where u.creationdate > lastCreated - interval '3 months'
group by s.state
答案 0 :(得分:3)
由于只进行一次扫描,这可能会表现得更好:
select count(*) as numberofusers,
state
from ( select id, state_id, creationdate,
max(creationdate) over (partition by state_id) - '3 months'::interval as cutoff
from users
) x
join states on states.id = x.state_id
where creationdate > cutoff
group by state
然而,在初始窗口聚合时,它会咀嚼很多工作记忆。
嗯,也许更像是:with cutoffs as (
select id, state,
(select max(creationdate)
from users
where users.state_id = states.id) - '3 months'::interval as cutoff
from states)
select count(*) as numberofusers, state
from users
join cutoffs on users.state_id = cutoffs.id
where users.creationdate > cutoff
group by state
这是试图挑逗PostgreSQL进行正确的分区扫描,但它并不是很理想。它仍然进行全表扫描,但至少只有一个。迭代通过CTE输出并在循环内部发出外部查询结果的set-returns函数可能效果最好,因为它可以为每个状态使用creationdate
索引。
答案 1 :(得分:2)
出于兴趣,以下查询如何执行?我对Postgresql如何处理最里面的查询(状态表+标量子查询)特别感兴趣。
用户必须有一个复合索引(state_id,creation_date)才能使用它。
select s2.id
,s2.state
,(select count(*)
from users u
where u.state_id = s2.id
and u.creationdate > s2.max_date) as numberOfUsers
from (select s.id
,s.state
,(select max(u.creationdate) - interval '3 months'
from users u
where u.state_id = s.id) as max_date
from states s
) s2;
编辑这是为该查询生成的计划,包含针对3个州的100,000个用户行:
Seq Scan on states s (actual time=4.033..13.949 rows=3 loops=1)
Buffers: shared hit=1743
SubPlan 3
-> Aggregate (actual time=4.636..4.636 rows=1 loops=3)
Buffers: shared hit=1742
InitPlan 2 (returns $2)
-> Result (actual time=0.028..0.028 rows=1 loops=3)
Buffers: shared hit=12
InitPlan 1 (returns $1)
-> Limit (actual time=0.022..0.022 rows=1 loops=3)
Buffers: shared hit=12
-> Index Scan Backward using users_state_id_creationdate_idx on users u (actual time=0.019..0.019 rows=1 loops=3)
Index Cond: ((state_id = $0) AND (creationdate IS NOT NULL))
Buffers: shared hit=12
-> Bitmap Heap Scan on users u (actual time=1.095..3.693 rows=8425 loops=3)
Recheck Cond: ((state_id = $0) AND (creationdate > $2))
Buffers: shared hit=1730
-> Bitmap Index Scan on users_state_id_creationdate_idx (actual time=1.017..1.017 rows=8425 loops=3)
Index Cond: ((state_id = $0) AND (creationdate > $2))
Buffers: shared hit=107
Total runtime: 14.017 ms
答案 2 :(得分:1)
这是我用来将时间减少到82毫秒的查询:
with cutoffs as (
select max(u.creationdate) as cuttoff, s.id, s.state,
from users u
join states s on u.state_id = s.id
group by s.state, s.id)
select count(*) as numberofusers, state
from users
join cutoffs on users.state_id = cutoffs.id
where users.creationdate > cutoff
group by state
感谢araqnid。
答案 3 :(得分:0)
您确定查询的哪一部分很慢吗?你能添加索引吗?我不是Postgres大师,但我怀疑如果用户没有在users.creationdate上编入索引,MAX()函数将不得不进行全表扫描。嗯,无论如何它可能要做一个......
那就说,什么都没有!
SELECT u.numUsers, s.state FROM
(SELECT count(id) as numUsers, state_id
FROM users
WHERE creationdate > (MAX(creationdate) - interval '3 Months'
GROUP BY state_id) u
left join states s on u.state_id = s.state_id