选择具有按两列分组的最大值的行

时间:2012-02-20 11:03:51

标签: mysql sql greatest-n-per-group

我已经看到了很多关于这类问题的解决方案(特别是这一个SQL Select only rows with Max Value on a Column),但这些似乎都不合适:

我有以下表格布局,附件的版本控制,它们绑定到实体:

TABLE attachments
+------+--------------+----------+----------------+---------------+
| id   | entitiy_id   | group_id | version_number | filename      |
+------+--------------+----------+----------------+---------------+
| 1    | 1            | 1        | 1              | file1-1.pdf   |
| 2    | 1            | 1        | 2              | file1-2.pdf   |
| 3    | 1            | 2        | 1              | file2-1.pdf   |
| 4    | 2            | 1        | 1              | file1-1.pdf   |
| 5    | 2            | 1        | 2              | file1-2.pdf   |
| 6    | 2            | 3        | 1              | file3-1.pdf   |
+------+--------------+----------+----------------+---------------+

输出应该是最大版本号,按group_id和entity_id分组,如果有帮助,我只需要一个单独的entity_ids列表:

+------+--------------+----------+----------------+---------------+
| id   | entitiy_id   | group_id | version_number | filename      |
+------+--------------+----------+----------------+---------------+
| 2    | 1            | 1        | 2              | file1-2.pdf   |
| 3    | 1            | 2        | 1              | file2-1.pdf   |
| 5    | 2            | 1        | 2              | file1-2.pdf   |
| 6    | 2            | 3        | 1              | file3-1.pdf   |
+------+--------------+----------+----------------+---------------+

我想出的是这个自我加入:

SELECT *
FROM   `attachments` `attachments`
       LEFT OUTER JOIN attachments t2
         ON ( attachments.group_id = t2.group_id
              AND attachments.version_number < t2.version_number )
WHERE  ( t2.group_id IS NULL )
   AND ( `t2`.`id` = 1 )
GROUP  BY t2.group_id

但是这个只有在不同的实体不共享相同的组号时才有效。不幸的是,这是必要的。

我在创建视图时遇到了一个有效的解决方案,但我当前的设置不支持此功能。

任何想法都受到高度赞赏。谢谢!

4 个答案:

答案 0 :(得分:3)

试试这个:

select t1.* from attachments t1
left join attachments t2
on t1.entity_id = t2.entity_id and t1.group_id = t2.group_id and
   t1.version_number < t2.version_number
where t2.version_number is null

答案 1 :(得分:2)

这适用于选择所有

SELECT attachments.*
FROM (
    SELECT entitiy_id, group_id, MAX(version_number) AS max_version
    FROM attachments
    GROUP BY entitiy_id, group_id
) AS maxVersions
INNER JOIN attachments
ON attachments.entitiy_id = maxVersions.entitiy_id
AND attachments.group_id = maxVersions.group_id
AND attachments.version_number = maxVersions.max_version

将此扩展为仅查找单个entitiy_id只会涉及在子查询中添加WHERE,因此这将给出

SELECT attachments.*
FROM (
    SELECT entitiy_id, group_id, MAX(version_number) AS max_version
    FROM attachments
    WHERE entitity_id = [[YOUR ENTITIY ID HERE]]
    GROUP BY entitiy_id, group_id
) AS maxVersions
INNER JOIN attachments
ON attachments.entitiy_id = maxVersions.entitiy_id
AND attachments.group_id = maxVersions.group_id
AND attachments.version_number = maxVersions.max_version

如果您希望确保随着行数的增加继续快速运行,我建议您确保将键添加到行(entitiy_id, group_id, max_version)的附件中,因为子查询将能够依靠这一点,从而确保它不会锁定桌子。

答案 2 :(得分:2)

这样可以解决问题:

select a1.* from attachments a1
inner join ( select entitiy_id, group_id, max(version_number) as version_number
             from attachments
             group by entitiy_id, group_id) a2 on a1.entitiy_id = a2.entitiy_id and
                                                  a1.group_id = a2.group_id and
                                                  a1.version_number = a2.version_number

答案 3 :(得分:0)

您也可以使用高性能公用表表达式(CTE)解决此问题。

WITH CTE AS
(
SELECT entitiy_id, group_id, version_number, filename,       
ROW_NUMBER() OVER (PARTITION BY entitiy_id, group_id ORDER BY version_number DESC) as RowNum
FROM attachments
)
SELECT entitiy_id, group_id, version_number, filename
FROM CTE 
WHERE RowNum = 1

SELECT T.entitiy_id, T.group_id, T.version_number, T.filename
FROM (SELECT entitiy_id, group_id, version_number, filename,       
     ROW_NUMBER() OVER (PARTITION BY entitiy_id, group_id ORDER BY version_number DESC) as RowNum
     FROM attachments
     ) as T 
WHERE RowNum = 1