在子查询

时间:2017-01-05 00:26:34

标签: mysql sql join greatest-n-per-group

greatest-n-per-group有超过1800个标记的问题和一些优秀的答案,我认为我找到了解决这个问题的解决方案 - 但我要么错过了解决方案,要么我需要一种新方法。

我有一个表photo_types来存储user的投票,他们投票(上调或下调)他们认为特定照片所属的photo_type。照片类型为1-10,每次投票均为1-1

+----+-----+-----------+------------+------+
| id | user | photo_id | photo_type | vote |
+----+------+----------+------------+------+
|  1 | jane |   photo1 |          1 |    1 |
|  2 | jane |   photo2 |          2 |    1 |
|  3 | jane |   photo3 |          4 |   -1 |
|  4 |  ben |   photo1 |          1 |    1 |
|  5 |  ben |   photo2 |          3 |   -1 |
|  6 |  ben |   photo2 |          2 |    1 |
|  7 | mary |   photo1 |          1 |   -1 |
|  8 | mary |   photo3 |         10 |    1 |
|  9 | mary |   photo2 |          1 |    1 |
| 10 | mary |   photo1 |          2 |   -1 |
+----+------+----------+------------+------+

我需要将此表格连接回photos表格(包含给定照片的所有其他详细信息) - 但仅包含前2个投票类型 每张照片。

我需要photos LEFT JOIN表格photo_types表格如下:

+----+----------+------------+----------------+---------------+------------+
| id | photo_id | photo_name |   photographer |      location |       date |
+----+----------+------------+----------------+---------------+------------+
|  1 |   photo1 | the bridge |    Bill Murray |  Brooklyn, NY | 2012-10-11 |
|  2 |   photo2 |    the cat | Jacques Chirac | Paris, France | 2013-01-03 |
|  3 |   photo3 |      a car |     the Grinch |    London, UK | 2016-09-01 |
+----+----------+------------+----------------+---------------+------------+

我显然是通过photo_id加入这两个表格。

要获得每张照片的最高投票类型,我尝试过这样的子查询:

SELECT photo_id, photo_type, sum(vote) AS votes
FROM photo_types
GROUP BY photo_type, photo_id
HAVING votes>0
ORDER BY votes DESC

photo_typephoto_id对投票总和进行分组 这样可以正常使用,但包括sum(vote) > 0的所有类型 - 而不仅仅是前2个投票类型 SQL Fiddle here

当包含在联接中时,它看起来像:

SELECT * 
FROM photos
LEFT JOIN
    (SELECT photo_id, photo_type, sum(vote) AS votes
    FROM photo_types
    GROUP BY photo_type, photo_id
    HAVING votes>0
    ORDER BY votes DESC) AS pt
ON photos.photo_id = pt.photo_id
WHERE photos.date > '2010-01-01';

SQL Fiddle here

我曾希望使用Bill Karwin's solution,但我根据分组值(在我的情况下为SUM)无法将表格加入自身。我试过的子查询看起来像:

SELECT pt1.*, SUM(pt1.vote) AS votes1, SUM(pt2.vote) AS votes2
FROM photo_types AS pt1
LEFT OUTER JOIN photo_types AS pt2
    ON pt1.photo_id = pt2.photo_id
        AND (votes1 < votes2
        OR (votes1 = votes2 AND pt1.id < pt2.id))
WHERE pt2.photo_id IS NULL

...它不起作用,因为它试图在计算值上加入两个表(与Bill的解决方案不同)。
SQL Fiddle here

问题
当分组基于greatest-n-per-group等计算值时,有没有办法获得SUM(xxx)

部分涵盖此问题的解决方案是herehere,但不包括分组值中的聚合。

另一个显而易见的方法是,每次投票时只需重新计算最高投票价值,并将其直接存储在photos表格中 - as discussed here - 但除非它&#39不可能 - 由于各种原因,我更愿意在SELECT内计算。

3 个答案:

答案 0 :(得分:1)

如果您的列表有限,最简单的方法是substring_index() / group_concat()诀窍:

SELECT photo_id,
       SUBSTRING_INDEX(GROUP_CONCAT(photo_type ORDER BY votes DESC), ',', 2) as top2
FROM (SELECT photo_id, photo_type, sum(vote) AS votes
      FROM photo_types
      GROUP BY photo_type, photo_id
      HAVING votes > 0
     ) pt
GROUP BY photo_id;

注意:

  • group_concat()的中间字符串大约是1k - 这对于这个问题来说已经足够了。
  • 备选方案(如您所发现的)要么使用变量进行更复杂的查询。

答案 1 :(得分:0)

查找xxx应用功能。它们比仅进行子聚合查询提供了更多的灵活性。

http://sqlserverplanet.com/sql-2005/cross-apply-explained

答案 2 :(得分:0)

好的,old blog post(在其他greatest-n-per-group解决方案中提到过几次),以下工作:

SELECT pt1.*
FROM 
  (SELECT id, photo_id, photo_type, sum(vote) AS votes
  FROM photo_types
  GROUP BY photo_type, photo_id
  HAVING votes>0) AS pt1
WHERE (
  SELECT COUNT(*) 
  FROM 
    (SELECT id, photo_id, photo_type, sum(vote) AS votes
    FROM photo_types
    GROUP BY photo_type, photo_id
    HAVING votes>0) AS pt2
  WHERE pt1.photo_id = pt2.photo_id and pt1.votes <= pt2.votes
) <=2
ORDER BY photo_id, votes DESC

see SqlFiddle here

然而:
- 不确定它有多高效,因为它使用了两个子查询 - 如果greatest-n中的任何一个具有相同的值(因为这会将计数推到指定的限制之外),则不会返回正确的结果数 - 正如您所见in this SqlFiddle