hive通过在两个字段上分组来选择最大计数

时间:2017-10-10 12:56:16

标签: sql hive

我正在尝试编写一个SQL查询来查找每个国家/地区中最受欢迎的艺术家。受欢迎的艺术家是具有最大评级数> = 8

的艺术家

下面是表格结构,

describe album;
albumid                string                                      
album_title              string                                      
album_artist             string`                             

describe album_ratings;
userid                  int                                         
albumid              string                                      
rating             int                                         

describe cusers;
userid                  int                                         
state                   string                                      
country                 string

下面是我写的一个查询,但它无效。

select album_artist, country, count(rating) 
from album, album_ratings, cusers 
where album.albumid=album_ratings.albumid 
      and album_ratings.userid=cusers.userid 
      and rating>=6 
group by country, album_artist 
having count(rating) = (
                        select max(t.cnt) 
                        from (
                              select count(rating) as cnt 
                              from album, album_ratings, cusers 
                              where album.albumid=album_ratings.albumid 
                              and album_ratings.userid=cusers.userid 
                              and rating>=6 
                              group by country, album_artist
                             ) as t 
                        group by t.country
                        );

2 个答案:

答案 0 :(得分:0)

您可以使用窗口功能row_number查找每个国家/地区中最受欢迎的艺术家(评分更高 - 更受欢迎):

select *
from (
    select c.country, 
        a.album_artist,
        sum(rating) as total_rating,
        row_number() over (partition by c.country order by sum(rating) desc) as rn
    from cusers c
    join album_ratings r on c.userid = r.userid
    join album a on r.albumid = a.albumid
    where r.rating >= 8
    group by c.country, 
        a.album_artist
    ) t
where rn = 1;

我假设总和(评级),因为我认为评级应该是累加的。

此外,始终使用显式连接语法而不是旧的基于逗号的连接。

答案 1 :(得分:0)

学习使用正确的,明确的JOIN语法。 从不FROM子句中使用逗号。

您可以使用窗口函数执行此操作:

select *
from (select album_artist, country, count(*) as cnt,
             row_number() over (partition by country order by count(*) desc) as seqnum
      from album a join
           album_ratings ar join
           on a.albumid = ar.albumid 
           cusers u
           on ar.userid = u.userid 
      where rating >= 6 
      group by country, album_artist 
     ) aru
where seqnum = 1;

如果您想要关联,请使用rank()代替row_number()