Question

我正在尝试从SQL表中检索“Best”可能的条目。

考虑一个包含电视节目的表格： id，title，episode，is_hidef，is_verified 例如：

id title         ep hidef verified
1  The Simpsons  1  True  False
2  The Simpsons  1  True  True
3  The Simpsons  1  True  True
4  The Simpsons  2  False False
5  The Simpsons  2  True  False

单个标题和剧集可能有重复的行，这些行可能有也可能没有布尔字段的不同值。可能有更多列包含其他信息，但这并不重要。

我想要一个结果集，为每一集提供最好的行（所以is_hidef和is_verified都是“真的”）。对于被视为“相等”的行，我想要最近的行（自然排序，或者按照abitrary datetime列的顺序）。

3  The Simpsons  1  True  True
5  The Simpsons  2  True  False

过去我会使用以下查询：

SELECT * FROM shows WHERE title='The Simpsons' GROUP BY episode ORDER BY is_hidef, is_verified

这适用于MySQL和SQLite，但违反SQL规范（GROUP BY要求恶化等）。我真的没有兴趣再次听到为什么MySQL如此糟糕，允许这样做;但是我非常有兴趣找到一种可以在其他引擎上运行的替代解决方案（如果你能给我django ORM代码，可以获得奖励积分）。

谢谢=）

Answer 1

在某种程度上类似于Andomar，但这个确实有效。

select C.*
FROM
(
    select min(ID) minid
    from (
        select distinct title, ep, max(hidef*1 + verified*1) ord
        from tbl
        group by title, ep) a
    inner join tbl b on b.title=a.title and b.ep=a.ep and b.hidef*1 + b.verified*1 = a.ord
    group by a.title, a.ep, a.ord
) D inner join tbl C on D.minid = C.id

第一级tiebreak使用* 1将位（SQL Server）或MySQL boolean转换为整数值，并添加列以生成“最佳”值。你可以给他们权重，例如如果hidef＆gt;已验证，然后使用 hidef * 2 +已验证* 1 ，可生成3,2,1或0。

第二级看起来属于“最佳”场景，并提取最小ID（或其他一些抢七列）。这对于将多匹配结果集减少到只有一条记录至关重要。

在这种特殊情况下（表模式），外部选择使用直接键来检索匹配的记录。

Answer 2

这基本上是groupwise-maximum-with-ties problem的一种形式。我认为没有符合SQL标准的解决方案。像这样的解决方案可以很好地执行：

SELECT  s2.id
,       s2.title
,       s2.episode
,       s2.is_hidef
,       s2.is_verified
FROM    (
        select  distinct title
        ,       episode
        from    shows
        where   title = 'The Simpsons' 
        ) s1
JOIN    shows s2
ON      s2.id = 
        (
        select  id
        from    shows s3
        where   s3.title = s1.title
                and s3.episode = s1.episode
        order by
                s3.is_hidef DESC
        ,       s3.is_verified DESC
        limit   1
        )

但考虑到可读性的代价，我会坚持原始查询。

替代使用GROUP BY而不使用聚合来检索不同的“最佳”结果

2 个答案: