SQL:排名/按总排名过滤

时间:2016-12-27 09:43:59

标签: time-series amazon-redshift

我有一个非常标准的"仅附加"使用Amazon Redshift将created_atgroup_name作为列的表。

我想在过去的[时间范围]中按组生成前N行的时间序列。

目前我用这个:

SELECT
    date_trunc('day', created_at) AS timeseries,
    my_table.group_name,
    COUNT(*) AS count
FROM
    my_table
JOIN (
    SELECT
        group_name,
        ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS rank
    FROM
        my_table
    WHERE
        created_at > (CURRENT_DATE - INTERVAL '1 days')
    GROUP BY
        group_name
    ) ranking ON (ranking.group_name = my_table.group_name)
WHERE
    created_at > (CURRENT_DATE - INTERVAL '1 days')
GROUP BY
    timeseries,
    my_table.group_name,
    ranking.rank
HAVING 
    ranking.rank <= 5
ORDER BY
    timeseries DESC

这很容易发生变化,因为created_at范围的过滤存在两次,如果需要更改会导致问题。

有没有办法让这个查询更优雅(理想情况下只使用一次时间过滤器)?

3 个答案:

答案 0 :(得分:0)

您可以为created_at添加连接条件,

例如,计算created_at的最大值和最小值,并将所有数据置于

之间
SELECT
    date_trunc('day', created_at) AS timeseries,
    my_table.group_name,
    COUNT(*) AS count
FROM
    my_table
JOIN (
    SELECT
        group_name,
        max(created_at) as max_createed,
        min(created_at) as min_createed,
        ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS rank
    FROM
        my_table
    WHERE
        created_at > (CURRENT_DATE - INTERVAL '1 days')
    GROUP BY
        group_name
    ) ranking ON (ranking.group_name = my_table.group_name)
AND created_ad between min_created and max_created
GROUP BY
    timeseries,
    my_table.group_name,
    ranking.rank
HAVING 
    ranking.rank <= 5
ORDER BY
    timeseries DESC

另外,我相信有更优雅的方法可以计算出来,而不会带两次同桌

答案 1 :(得分:0)

试试这个,也应该更快破坏

    SELECT
        ranking.date AS timeseries,
        ranking.group_name,
        COUNT(*) AS count
    FROM
        my_table
    JOIN (
        SELECT
            group_name,
            date(created_at) as date,
            ROW_NUMBER() OVER (PARTITION BY date(created_at) ORDER BY COUNT(*) DESC) AS rank
        FROM
            my_table
        WHERE
            created_at > (CURRENT_DATE - INTERVAL '1 days')
        GROUP BY
            group_name,
            date(created_at) as date
        ) ranking 
 WHERE rank <=5
 GROUP BY 1,2

答案 2 :(得分:0)

我不认为我完全理解您的要求,但此查询应该每天排在前5位。

select timeseries, group_name, count from (
    select timeseries, group_name, count,
        row_number() over (partition by timeseries order by count desc) as rank
    from (
        select date_trunc('day', created_at) AS timeseries,
            group_name,
            count(*) AS count
        from my_table
        where created_at > sysdate - '1 day'::interval
        group by 1,2
    )
) where rank <= 5
order by 1 desc

此查询应该给出前5个组的每日计数:

with daily_counts as (
    select date_trunc('day', created_at) AS timeseries,
        group_name,
        count(*) AS count
    from my_table
    where created_at > sysdate - '1 day'::interval
    group by 1,2
)
select d.timeseries, d.group_name, d.count
from daily_counts d
join (
    select group_name, sum(count) as total
    from daily_counts
    group by group_name order by total desc
    limit 5
) r on d.group_name=r.group_name
order by 1,3 desc