Question

我有一个表tbl_entries，其结构如下：

+----+------+------+------+
| id | col1 | col2 | col3 |
+----+------+------+------+
| 11 |    a |    b |    c |
| 12 |    d |    e |    a |
| 13 |    a |    b |    c |
| 14 |    X |    e |    2 |
| 15 |    a |    b |    c |
+----+------+------+------+

另一个表tbl_reviewlist具有以下结构：

+----+-------+------+------+------+
| id | entid | cola | colb | colc |
+----+-------+------+------+------+
|  1 |    12 |    N |    Y |    Y |
|  2 |    13 |    Y |    N |    Y |
|  3 |    14 |    Y |    N |    N |
+----+-------+------+------+------+

基本上，tbl_reviewlist包含有关tbl_entries中条目的评论。但是，由于某些已知原因，tbl_entries中的条目是重复的。我通过以下查询提取唯一记录：

SELECT * FROM `tbl_entries` GROUP BY `col1`, `col2`, `col3`;

但是，无论是否经过审核，都会返回tbl_entries中任何一个重复的行。我希望查询首选那些已经过审核的行。我怎么能这样做？

编辑：我想更喜欢已经过审核的行，但如果有行还没有经过审核，那么它也应该返回。

提前致谢！

Answer 1

你真的尝试了什么吗？

提示：SQL标准要求具有group by子句的查询的结果集中的每一列都必须

分组列
汇总功能 - sum()，count()等，
常数值/文字，或
仅由上述内容衍生的表达。

一些破碎的实现（我相信MySQL就是其中之一）允许包含其他列并提供他们自己的...创意......行为。如果您考虑一下，group by基本上会说要执行以下操作：

通过分组表达式
根据序列分组
将每个此类分区折叠为一行，计算聚合表达式。

一旦你完成了这个，在崩溃的组分区中要求一些不一致的东西是什么意思？

如果您有一个包含A，B，C，D和E列的表foo，并说出类似

select A,B,C,D,E from foo group by A,B,C

按照标准，你应该得到一个编译错误。异常实现[通常]将此类查询视为[粗略]等效于

select *
from foo t
join ( select A,B,C
       from foo
       group by A,B,C
     ) x on x.A = t.A
        and x.B = t.B
        and x.C = t.C

但是，如果不审阅您正在使用的具体实施的文档，我不一定会依赖它。

如果您想查找仅经过审核的条目，请执行以下操作：

select *
from tbl_entries t
where exists ( select *
               from tbl_reviewlist x
               where x.entid = t.id
             )

会帮到你吗但是，如果您要查找在col1，col2和col3上重复的已审核条目，那么您应该这样做：

select *
from tbl_entries t
join ( select col1,col2,col3
       from tbl_entries x
       group by col1,col2,col3
       having count(*) > 1
     ) d on d.col1 = t.col1
        and d.col2 = t.col2
        and d.col3 = t.col3
where exists ( select *
               from tbl_reviewlist x
               where x.entid = t.id
             )

由于您的问题陈述相当不明确，因此可能会采取以下措施：

select t.col1            ,
       t.col2            ,
       t.col3            ,
       t.duplicate_count ,
       coalesce(x.review_count,0) as review_count
from      ( select col1 ,                       
                   col2 ,                       
                   col3 ,                       
                   count(*) as duplicate_count  
            from tbl_entries
            group by col1 ,
                     col2 ,
                     col3
          ) t
left join ( select cola, colb, colc , count(*) as review_count
            from tbl_reviewList
            group by cola, colb, colc
            having count(*) > 1
          ) x on x.cola = t.col1
             and x.colb = t.col2
             and x.colc = t.col3
order by sign(coalesce(x.review_count,0)) desc ,
         t.col1 ,
         t.col2 ,
         t.col3

此查询

总结了entries表，显示了搜索col1 / 2/3组合的时间计数。
总结了评论表，为每个可乐/ b / c组合制定了评论计数
将它们连接在一起匹配cols a：1，b：2 c：3
命令他们
- 首先将已审核的商品放在未审核的商品上，
- 然后是col1 / 2/3值。

Answer 2

我认为有一种减少重复的方法，但这应该是一个开始：

select
  tbl_entries.ID,
  col1,
  col2,
  col3,
  cola, -- ... you get the idea ...
from (
select coalesce(min(entid), min(tbl_entries.ID)) as favID
from tbl_entries left join tbl_reviewlist on entid = tbl_entries.ID
group by col1, col2, col3
) as A join tbl_entries on tbl_entries.ID = favID
left join tbl_reviewlist on entid = tbl_entries.ID

基本上，您将所需的输出提取到核心ID列表，然后重新映射回数据......

Answer 3

SELECT e.col1, e.col2, e.col3, 
       COALESCE(MIN(r.entid), MIN(e.id)) AS id 
FROM tbl_entries AS e
  LEFT JOIN tbl_reviewlist AS r
    ON r.entid = e.id
GROUP BY e.col1, e.col2, e.col3 ;

在 SQL-Fiddle

进行测试

使用'GROUP BY'而更喜欢在另一个表中关联的行

3 个答案: