计算非重复列的MAX值时出现问题

时间:2019-07-05 12:13:32

标签: presto amazon-athena

我在Amazon Athena中有一个名为“ levels”的表,该表包括名为“ user”,“ levelstarted”和“ levelcount”的列。该表如下所示: +------+---------------+--------------+ | user | levelstarted | startedcount | +------+---------------+--------------+ | A | 0050 | 2 | | A | 0051 | 1 | | A | 0052 | 3 | | B | 0030 | 1 | | B | 0031 | 2 | | B | 0032 | 5 | | C | 0010 | 6 | | C | 0011 | 3 | | C | 0012 | 3 | +------+---------------+--------------+

对于每个用户,我想要找到一个最高级别的开始,以及玩家开始该级别的次数。我希望得到如下结果:

+------+----------------+----------------+ | user | highestlevel | | startedcount | | +------+----------------+----------------+ | A | 0052 | 3 | | B | 0032 | 5 | | C | 0012 | 3 | +------+----------------+----------------+

找到最高的入门水平就可以了:

SELECT 
 DISTINCT user as payer,
 MAX(levelstarted) as levelstarted
FROM "levels"
GROUP BY user, startedcount

但是当我添加开始计数时,结果中会有重复的用户:

SELECT 
 DISTINCT user as payer,
 MAX(levelstarted) as levelstarted,
 startedcount
FROM "levels"
GROUP BY user, levelcount

1 个答案:

答案 0 :(得分:0)

在Athena / Presto中,您可以使用max_by函数来查找与列的最大值关联的值:

SELECT
  user,
  MAX(levelstarted) AS highestlevel,
  MAX_BY(startedcount, levelstarted) AS startedcount
FROM (VALUES ('A', '0050', 2),
             ('A', '0051', 1),
             ('A', '0052', 3),
             ('B', '0030', 1),
             ('B', '0031', 2),
             ('B', '0032', 5),
             ('C', '0010', 6),
             ('C', '0011', 3),
             ('C', '0012', 3)
) AS v (user, levelstarted, startedcount)
GROUP BY user
ORDER BY user