如何在两个表连接中进行sql查询包括group by和max value

时间:2017-05-05 12:28:37

标签: sql hive

我有两个表,coachesawardscoachescoaches给了我教练和球队之间的关系。 awardscoaches包括教练奖励通知。以下是他们的结构:

教练

coachid: string   (the id of each coach, it is the primary key in this table)
tmid:    string   (the team id)

awardscoaches

coachid: string   (the id of each coach)
award:   string   (the award the coach got, each coach may have more than one award, so the primary key in this table is the combine of coachid and award)

现在我要写一个查询,找出哪个教练获得最多奖项的每个团队。

下面是我目前拥有的sql:

select c.tmid tmid, max(a.count) count from coaches c 
inner join (select coachid, count(award) count 
from awardscoaches group by coachid) a 
on a.coachid = c.coachid group by c.tmid; 

此查询将返回每个团队的最大奖励数。但我不知道如何选择结果集中的coachid,因为我只能从组中选择字段。 我正在寻找一个通用的sql语句来实现这个要求。

我在下面尝试了命令:

select coachid,tmid,award_count 
from (select coachid,tmid,award_count
      ,rank() over(partition by tmid order by award_count desc) as rnk
      from (select a.coachid, count(*) over(partition by a.coachid) as award_count,c.tmid 
            from awardscoaches a
            join coaches c on c.coachid=a.coachid
           ) t
      ) t
where rnk = 1

但得到重复的行如下:

murrabr01c  WAS 17
murrabr01c  WAS 17
murrabr01c  WAS 17
murrabr01c  WAS 17
krommbo01c  WIJ 10
krommbo01c  WIJ 10
krommbo01c  WIJ 10
krommbo01c  WIJ 10
wattto01c   WIN 7
wattto01c   WIN 7
wattto01c   WIN 7

1 个答案:

答案 0 :(得分:0)

max(struct(a.cnt,a.coachid)).col2   as coachid 
select      c.tmid                              as tmid
           ,max(struct(a.cnt,a.coachid)).col2   as coachid 

from                    coaches c 

            join       (select      coachid
                                   ,count(*)    as cnt

                        from        awardscoaches 

                        group by    coachid
                        ) a 

            on          a.coachid = 
                        c.coachid 

group by    c.tmid
;