每天最受欢迎的歌曲

时间:2017-08-10 21:17:09

标签: sql hiveql

我有以下数据

date            user    song
..........      .....   .....
2017-07-12      u1      song1
2017-07-12      u2      song1
2017-07-12      u1      song1
2017-07-12      u2      song2
2017-07-12      u1      song3
2017-07-12      u2      song1
2017-07-12      u1      song2
2017-07-12      u2      song1
2017-07-13      u1      song2      
2017-07-13      u2      song2
2017-07-13      u1      song2
2017-07-13      u2      song1
2017-07-13      u1      song1

我想要以下输出

date                       song
..........                 .....
2017-07-12                 song1
2017-07-13                 song2

我能够获得计数和歌曲名称但无法在每天选择前一名。我使用了以下查询

SELECT 
dt,song_name,count(song_name) as c
FROM es_session GROUP BY 
dt,song_name order by c,dt DESC 

1 个答案:

答案 0 :(得分:1)

您正在寻找的是统计中的“模式”。您可以使用窗口函数计算它:

SELECT ds.*
FROM (SELECT dt, song_name, count(song_name) as c,
             ROW_NUMBER() OVER (PARTITION BY dt ORDER BY COUNT(song_name) DESC) as seqnum
      FROM es_session
      GROUP BY dt, song_name
     ) ds
WHERE seqnum = 1
ORDER BY c, dt DESC ;

如果有重复项,则会选择任意值。如果您想要所有这些内容,请使用RANK()代替ROW_NUMBER()