greatest-n-per-group
有超过1800个标记的问题和一些优秀的答案,我认为我找到了解决这个问题的解决方案 - 但我要么错过了解决方案,要么我需要一种新方法。
我有一个表photo_types
来存储user
的投票,他们投票(上调或下调)他们认为特定照片所属的photo_type
。照片类型为1-10
,每次投票均为1
或-1
。
+----+-----+-----------+------------+------+
| id | user | photo_id | photo_type | vote |
+----+------+----------+------------+------+
| 1 | jane | photo1 | 1 | 1 |
| 2 | jane | photo2 | 2 | 1 |
| 3 | jane | photo3 | 4 | -1 |
| 4 | ben | photo1 | 1 | 1 |
| 5 | ben | photo2 | 3 | -1 |
| 6 | ben | photo2 | 2 | 1 |
| 7 | mary | photo1 | 1 | -1 |
| 8 | mary | photo3 | 10 | 1 |
| 9 | mary | photo2 | 1 | 1 |
| 10 | mary | photo1 | 2 | -1 |
+----+------+----------+------------+------+
我需要将此表格连接回photos
表格(包含给定照片的所有其他详细信息) - 但仅包含前2个投票类型 每张照片。
我需要photos
LEFT JOIN
表格photo_types
表格如下:
+----+----------+------------+----------------+---------------+------------+
| id | photo_id | photo_name | photographer | location | date |
+----+----------+------------+----------------+---------------+------------+
| 1 | photo1 | the bridge | Bill Murray | Brooklyn, NY | 2012-10-11 |
| 2 | photo2 | the cat | Jacques Chirac | Paris, France | 2013-01-03 |
| 3 | photo3 | a car | the Grinch | London, UK | 2016-09-01 |
+----+----------+------------+----------------+---------------+------------+
我显然是通过photo_id
加入这两个表格。
要获得每张照片的最高投票类型,我尝试过这样的子查询:
SELECT photo_id, photo_type, sum(vote) AS votes
FROM photo_types
GROUP BY photo_type, photo_id
HAVING votes>0
ORDER BY votes DESC
按photo_type
和photo_id
对投票总和进行分组
这样可以正常使用,但包括sum(vote) > 0
的所有类型 - 而不仅仅是前2个投票类型
SQL Fiddle here
当包含在联接中时,它看起来像:
SELECT *
FROM photos
LEFT JOIN
(SELECT photo_id, photo_type, sum(vote) AS votes
FROM photo_types
GROUP BY photo_type, photo_id
HAVING votes>0
ORDER BY votes DESC) AS pt
ON photos.photo_id = pt.photo_id
WHERE photos.date > '2010-01-01';
我曾希望使用Bill Karwin's solution,但我根据分组值(在我的情况下为SUM
)无法将表格加入自身。我试过的子查询看起来像:
SELECT pt1.*, SUM(pt1.vote) AS votes1, SUM(pt2.vote) AS votes2
FROM photo_types AS pt1
LEFT OUTER JOIN photo_types AS pt2
ON pt1.photo_id = pt2.photo_id
AND (votes1 < votes2
OR (votes1 = votes2 AND pt1.id < pt2.id))
WHERE pt2.photo_id IS NULL
...它不起作用,因为它试图在计算值上加入两个表(与Bill的解决方案不同)。
SQL Fiddle here
问题
当分组基于greatest-n-per-group
等计算值时,有没有办法获得SUM(xxx)
部分涵盖此问题的解决方案是here和here,但不包括分组值中的聚合。
另一个显而易见的方法是,每次投票时只需重新计算最高投票价值,并将其直接存储在photos
表格中 - as discussed here - 但除非它&#39不可能 - 由于各种原因,我更愿意在SELECT
内计算。
答案 0 :(得分:1)
如果您的列表有限,最简单的方法是substring_index()
/ group_concat()
诀窍:
SELECT photo_id,
SUBSTRING_INDEX(GROUP_CONCAT(photo_type ORDER BY votes DESC), ',', 2) as top2
FROM (SELECT photo_id, photo_type, sum(vote) AS votes
FROM photo_types
GROUP BY photo_type, photo_id
HAVING votes > 0
) pt
GROUP BY photo_id;
注意:
group_concat()
的中间字符串大约是1k - 这对于这个问题来说已经足够了。答案 1 :(得分:0)
查找xxx应用功能。它们比仅进行子聚合查询提供了更多的灵活性。
http://sqlserverplanet.com/sql-2005/cross-apply-explained
答案 2 :(得分:0)
好的,old blog post(在其他greatest-n-per-group
解决方案中提到过几次),以下工作:
SELECT pt1.*
FROM
(SELECT id, photo_id, photo_type, sum(vote) AS votes
FROM photo_types
GROUP BY photo_type, photo_id
HAVING votes>0) AS pt1
WHERE (
SELECT COUNT(*)
FROM
(SELECT id, photo_id, photo_type, sum(vote) AS votes
FROM photo_types
GROUP BY photo_type, photo_id
HAVING votes>0) AS pt2
WHERE pt1.photo_id = pt2.photo_id and pt1.votes <= pt2.votes
) <=2
ORDER BY photo_id, votes DESC
然而:
- 不确定它有多高效,因为它使用了两个子查询
- 如果greatest-n
中的任何一个具有相同的值(因为这会将计数推到指定的限制之外),则不会返回正确的结果数 - 正如您所见in this SqlFiddle