在每个类别中选择前20个不同的行

时间:2018-07-17 16:56:14

标签: sql hive

我有以下格式的数据库表。

Product   | Date      | Score 
A | 01/01/18 | 99
B | 01/01/18 | 98
C | 01/01/18 | 97
--------------------------
A | 02/01/18 | 99
B | 02/01/18 | 98
C | 02/01/18 | 97
--------------------------
D | 03/01/18 | 99
A | 03/01/18 | 98
B | 03/01/18 | 97
C | 03/01/18 | 96

我希望每个月都选一个,这样就不会有重复的产品。例如,上表的输出应为

Product   | Date      | Score 
A | 01/01/18 | 99
B | 02/01/18 | 98
D | 03/01/18 | 99

如何通过单个sql查询获得此结果?实际的表比这个大得多,我希望每个月前20名都没有重复。

3 个答案:

答案 0 :(得分:0)

这是一个很难解决的问题-一种子图问题,实际上并不适合SQL。有一种蛮力的方法:

with jan as (
      select *
      from t
      where date = '2018-01-01'
      limit 1
     ),
     feb as (
      select *
      from t
      where date = '2018-02-01' and
            product not in (select product from jan)
     ),
     mar as (
      select *
      from t
      where date = '2018-03-01' and
            product not in (select product from jan) and
            product not in (select product from feb)
    )
select *
from jan
union all
select *
from feb
union all
select *
from mar;

您可以使用其他CTE将其推广。但是,不能保证一个月内就会有一种产品-即使它本来可以有一个。

答案 1 :(得分:0)

可以使用row_number。

select * from (
select row_Number() over(partition by Product  order by Product ) as rno,* from 
Products   
) as t where t.rno<=20

答案 2 :(得分:0)

我认为您希望每个月的前20条记录不重复产品,否则下面的解决方案将起作用。

select * 
into #temp
from 
(values 
('A','01/01/18','99')
,('B','01/01/18','98') 
,('C','01/01/18','97') 
,('A','02/01/18','99') 
,('B','02/01/18','98') 
,('C','02/01/18','97') 
,('D','03/01/18','99') 
,('A','03/01/18','98') 
,('B','03/01/18','97') 
,('C','03/01/18','96')
) AS VTE (Product ,Date, Score  )

select * from 
(
    select * , ROW_NUMBER() over (partition by date,product  order by score ) as rn 
    from #TEMP
)
 A where rn < 20