Question

我有下表（播放器）列

playerId  score  teamId

此表包含所有球队的所有球员信息。 PlayerID是主要列。每个团队都包含多个玩家，因此teamId上有许多重复值。分数是每个玩家的分数。

我想编写一个hive-sql来查询每个团队的最高分数。以下是我试过的查询：

select max(score) score, teamId from player group by teamId

此查询工作正常，但它只显示teamId和最高分数。我也想查询playerId。如果我在select列上添加playerId，我会遇到以下错误：

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException [Error 10002]: Line 1:32 Invalid column reference 'playerId'

似乎我只能从group by获取列。如何编写查询以查找playerId？

Answer 1

在Hive中，您应该使用窗口函数来执行此操作：

select p.score, p.teamId
from (select p.*,
             row_number() over (partition by teamId order by score desc) as seqnum
      from player p
     ) p
where seqnum = 1;

进行单独的聚合和加入是＆＃34; old＆＃34;表达这种逻辑的方式。在过去的几十年里，SQL变得越来越强大。

Answer 2

除非两名球员在球队中得分相同，否则这将有效。在这种情况下，它将为该团队返回两行。

select a.score, a.teamId, b.playerId
from (
select max(score) as score, teamId 
from player 
group by teamId
) a
inner join player b
on a.teamId = b.teamId and a.score = b.score

如何从包含

2 个答案: