根据两列选择每个组中的最佳行

时间:2018-11-21 02:24:57

标签: mysql greatest-n-per-group

假设我们有下表,其中每一行代表用户在编程竞赛中提交的内容,id是自动递增的主键,probid标识提交内容所针对的问题,score是提交问题所获得的分数,date是提交问题的时间戳。每个用户可以针对自己的问题提交任意多次:

+----+----------+--------+-------+------------+
| id | username | probid | score |    date    |
+----+----------+--------+-------+------------+
|  1 | brian    |      1 |     5 | 1542766686 |
|  2 | alex     |      1 |    10 | 1542766686 |
|  3 | alex     |      2 |     5 | 1542766901 |
|  4 | brian    |      1 |    10 | 1542766944 |
|  5 | jacob    |      2 |    10 | 1542766983 |
|  6 | jacob    |      1 |    10 | 1542767053 |
|  7 | brian    |      2 |     8 | 1542767271 |
|  8 | jacob    |      2 |    10 | 1542767456 |
|  9 | brian    |      2 |     7 | 1542767522 |
+----+----------+--------+-------+------------+

为了对参赛者进行排名,我们需要确定每个用户对每个问题的最佳提交方式。 “最好”的提交是得分最高的,并且由提交ID打破联系(即,如果用户在同一问题上两次获得相同的分数,则我们只关心这两个提交中的较早者)。这将产生一个如下表:

+----------+--------+----+-------+------------+
| username | probid | id | score |    date    |
+----------+--------+----+-------+------------+
| alex     |      1 |  2 |    10 | 1542766686 |
| alex     |      2 |  3 |     5 | 1542766901 |
| brian    |      1 |  4 |    10 | 1542766944 |
| brian    |      2 |  7 |     8 | 1542767271 |
| jacob    |      1 |  6 |    10 | 1542767053 |
| jacob    |      2 |  5 |    10 | 1542766983 |
+----------+--------+----+-------+------------+

如何编写查询来完成此操作?

4 个答案:

答案 0 :(得分:0)

SELECT username , probid , id , score , `date`
FROM tableName
ORDER BY username, score DESC, ID

答案 1 :(得分:0)

使用MySQL-8.0或MariaDB-10.2或更高版本:

SELECT username, probid, id, score, `date`
FROM (
    SELECT username, probid, id, score, `date`,
           ROW_NUMBER() over (
                 PARTITION BY username,probid
                 ORDER BY score DESC) as `rank`
    FROM tablename
) as tmp
WHERE tmp.`rank` = 1

答案 2 :(得分:0)

此查询也适用于8.0之前的MySQL版本。 LEFT JOIN删除重复的分数,以确保相等分数在给定分数中的结果集中只有最低的日期。然后WHERE子句可确保我们在给定的用户/问题组合中获得最高分:

SELECT t1.username, t1.probid, t1.id, t1.score, t1.date
FROM tablename t1
LEFT JOIN tablename t2
    ON t2.username = t1.username AND
       t2.probid = t1.probid AND
       t2.score = t1.score AND
       t2.date < t1.date
WHERE t2.id IS NULL AND
      t1.score = (SELECT MAX(score) FROM tablename t3 WHERE t3.username = t1.username AND t3.probid = t1.probid)
ORDER BY t1.username, t1.probid

更新

将表JOIN首先列出每个问题每个用户的最高得分列表,而不是为结果表中的每一行计算MAX值,几乎可以肯定是更有效率的。该查询改为:

SELECT t1.username, t1.probid, t1.id, t1.score, t1.date
FROM tablename t1
JOIN (SELECT username, probid, MAX(score) AS score
      FROM tablename
      GROUP BY username, probid) t2
    ON t2.username = t1.username AND 
       t2.probid = t1.probid AND 
       t2.score = t1.score
LEFT JOIN tablename t3
    ON t3.username = t1.username AND
       t3.probid = t1.probid AND
       t3.score = t1.score AND
       t3.date < t1.date
WHERE t3.id IS NULL
ORDER BY t1.username, t1.probid

输出(对于两个查询):

username    probid  id  score   date
alex        1       2   10      1542766686
alex        2       3   5       1542766901
brian       1       4   10      1542766944
brian       2       7   8       1542767271
jacob       1       6   10      1542767053
jacob       2       5   10      1542766983

Updated Demo on SQLFiddle

答案 3 :(得分:0)

在MySQL 8.0.2之前的版本中,我们可以使用Row_Number()模拟User-defined Variables的功能。在此technique中,我们首先以特定顺序获取数据(取决于手头的问题陈述)。

在您的情况下,在probidusername的分区中,我们需要按降序对分数进行排名,时间戳记值较低的行具有更高的优先级(打破平局)。因此,我们将ORDER BY probid, username, score DESC, date ASC

现在,我们可以将此结果集用作Derived Table,并确定行号。就像循环技术(我们在应用程序代码中使用的,例如:PHP)一样。我们将前一行的值存储在用户定义的变量中,并使用条件CASE .. WHEN表达式来根据前一行检查当前行的值。然后,相应地分配行号。

最终,我们将仅考虑行号为1 的行,并(如果需要)按usernameprobid对其进行排序。


查询

SELECT dt2.username,
       dt2.probid,
       dt2.id,
       dt2.score,
       dt2.date
FROM   (SELECT @rn := CASE
                        WHEN @un = dt1.username
                             AND @pid = dt1.probid THEN @rn + 1
                        ELSE 1
                      end          AS row_no,
               @un := dt1.username AS username,
               @pid := dt1.probid  AS probid,
               dt1.id,
               dt1.score,
               dt1.date
        FROM   (SELECT id,
                       username,
                       probid,
                       score,
                       date
                FROM   your_table
                ORDER  BY username,
                          probid,
                          score DESC,
                          date ASC) AS dt1
               CROSS JOIN (SELECT @un := '',
                                  @pid := 0,
                                  @rn := 0) AS user_init_vars) AS dt2
WHERE  dt2.row_no = 1  
ORDER BY dt2.username, dt2.probid;

结果

| username | probid | id  | score | date       |
| -------- | ------ | --- | ----- | ---------- |
| alex     | 1      | 2   | 10    | 1542766686 |
| alex     | 2      | 3   | 5     | 1542766901 |
| brian    | 1      | 4   | 10    | 1542766944 |
| brian    | 2      | 7   | 8     | 1542767271 |
| jacob    | 1      | 6   | 10    | 1542767053 |
| jacob    | 2      | 5   | 10    | 1542766983 |

View on DB Fiddle