在Redshift中按类别选择n个最大计数

时间:2017-07-28 00:31:01

标签: sql amazon-redshift

我想在表格中选择每组最常见的X对。 让我们考虑下表:

+-------------+-----------+
| identifier  |    city   |
+-------------+-----------+
| AB          |  Seattle  |
| AC          |  Seattle  |
| AC          |  Seattle  |
| AB          |  Seattle  |
| AD          |  Seattle  |
| AB          |  Chicago  |
| AB          |  Chicago  |
| AD          |  Chicago  |
| AD          |  Chicago  |
| BC          |  Chicago  |
+-------------+-----------+
  • 西雅图,AB出现2次
  • 西雅图,AC出现2次
  • 西雅图,AD发生1次
  • 芝加哥,AB出现2次
  • 芝加哥,AD发生2次
  • 芝加哥,BC出现1次

如果我想选择每个城市最多2个公共区域,结果应为:

+-------------+-----------+
| identifier  |    city   |
+-------------+-----------+
| AB          |  Seattle  |
| AC          |  Seattle  |
| AB          |  Chicago  |
| AD          |  Chicago  |
+-------------+-----------+

感谢任何帮助。谢谢, 本尼

2 个答案:

答案 0 :(得分:2)

您可以在行号中使用count来订购每个城市组合的出现次数,然后选择前两个。

select city,identifier 
from (
select city,identifier
,row_number() over(partition by city order by count(*) desc,identifier) as rnum_cnt
from tbl
group by city,identifier
) t
where rnum_cnt<=2

答案 1 :(得分:0)

使用WITH子句:

with
    _counts as (
        select
            identifier,
            city,
            count(*) as city_id_count
        from
            t1
        group by
            identifier,
            city
    ),

    _counts_and_max as (
        select
            identifier,
            city,
            city_id_count,
            max(city_id_count) over (partition by city) as city_max_count
        from
            _counts
    )

    select
        identifier,
        city
    from
        _counts_and_max
    where
        city_id_count = city_max_count
    ;