我有一个非常标准的"仅附加"使用Amazon Redshift将created_at
和group_name
作为列的表。
我想在过去的[时间范围]中按组生成前N行的时间序列。
目前我用这个:
SELECT
date_trunc('day', created_at) AS timeseries,
my_table.group_name,
COUNT(*) AS count
FROM
my_table
JOIN (
SELECT
group_name,
ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS rank
FROM
my_table
WHERE
created_at > (CURRENT_DATE - INTERVAL '1 days')
GROUP BY
group_name
) ranking ON (ranking.group_name = my_table.group_name)
WHERE
created_at > (CURRENT_DATE - INTERVAL '1 days')
GROUP BY
timeseries,
my_table.group_name,
ranking.rank
HAVING
ranking.rank <= 5
ORDER BY
timeseries DESC
这很容易发生变化,因为created_at
范围的过滤存在两次,如果需要更改会导致问题。
有没有办法让这个查询更优雅(理想情况下只使用一次时间过滤器)?
答案 0 :(得分:0)
您可以为created_at添加连接条件,
例如,计算created_at的最大值和最小值,并将所有数据置于
之间SELECT
date_trunc('day', created_at) AS timeseries,
my_table.group_name,
COUNT(*) AS count
FROM
my_table
JOIN (
SELECT
group_name,
max(created_at) as max_createed,
min(created_at) as min_createed,
ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS rank
FROM
my_table
WHERE
created_at > (CURRENT_DATE - INTERVAL '1 days')
GROUP BY
group_name
) ranking ON (ranking.group_name = my_table.group_name)
AND created_ad between min_created and max_created
GROUP BY
timeseries,
my_table.group_name,
ranking.rank
HAVING
ranking.rank <= 5
ORDER BY
timeseries DESC
另外,我相信有更优雅的方法可以计算出来,而不会带两次同桌
答案 1 :(得分:0)
试试这个,也应该更快破坏
SELECT
ranking.date AS timeseries,
ranking.group_name,
COUNT(*) AS count
FROM
my_table
JOIN (
SELECT
group_name,
date(created_at) as date,
ROW_NUMBER() OVER (PARTITION BY date(created_at) ORDER BY COUNT(*) DESC) AS rank
FROM
my_table
WHERE
created_at > (CURRENT_DATE - INTERVAL '1 days')
GROUP BY
group_name,
date(created_at) as date
) ranking
WHERE rank <=5
GROUP BY 1,2
答案 2 :(得分:0)
我不认为我完全理解您的要求,但此查询应该每天排在前5位。
select timeseries, group_name, count from (
select timeseries, group_name, count,
row_number() over (partition by timeseries order by count desc) as rank
from (
select date_trunc('day', created_at) AS timeseries,
group_name,
count(*) AS count
from my_table
where created_at > sysdate - '1 day'::interval
group by 1,2
)
) where rank <= 5
order by 1 desc
此查询应该给出前5个组的每日计数:
with daily_counts as (
select date_trunc('day', created_at) AS timeseries,
group_name,
count(*) AS count
from my_table
where created_at > sysdate - '1 day'::interval
group by 1,2
)
select d.timeseries, d.group_name, d.count
from daily_counts d
join (
select group_name, sum(count) as total
from daily_counts
group by group_name order by total desc
limit 5
) r on d.group_name=r.group_name
order by 1,3 desc