Question

我正在研究一个查询，以在多个列中查找重复的值，因此，我将从关注查询的单个部分开始，以获得更好的解释。

最后，我只需要知道这4列中是否有重复项，以及该重复项在哪个列中即可。

这是单个查询：

select  count(*) as cnt, 'CUST_REF' as what_column
 from sometable 
  where status != 'whateverStatus' 
    and custm_id = 1234
 group by cust_ref having count(cust_ref) > 1;

所以这很好用，除了输出是2行。看起来第一行是该列中的总匹配数> 1，然后下一行是实际的重复计数，如下所示：

cnt what_column
9440    CUST_REF
2   CUST_REF

我的问题是我如何才能得到第二行而没有该列的总数呢？（此列的值2正确），即我只想要

cnt what_column    
2   CUST_REF

将它们放在一起：

我将所有这些与UNION放在一起，因此对于4列，将是这样的：

select  count(*) as cnt, 'CUST_REF' as what_column
 from sometable 
  where status != 'whateverStatus' 
    and custm_id = 1234
 group by cust_ref having count(cust_ref) > 1
 union
 select  count(*) as cnt, 'CUST_PO' as what_column
 from sometable 
  where status != 'whateverStatus' 
    and custm_id = 1234
 group by cust_po having count(cust_po) > 1
  union
 select count(*) as cnt, 'SHIP_BL' as what_column
 from sometable 
  where status != 'whateverStatus' 
    and custm_id = 1234
 group by ship_bl having count(ship_bl) > 1
  union
 select count(*) as cnt, 'CUST_SHIPID' as what_column
 from sometable 
  where status != 'whateverStatus' 
    and custm_id = 1234
 group by cust_shipid having count(cust_shipid) > 1;

其输出呈现以下内容，在这里我想将所有显示重复项的字段归为一组，并且也忽略了总数。

cnt what_column
9440    CUST_REF
2   CUST_REF
332 CUST_PO
3   CUST_PO
2   CUST_PO
8   CUST_PO
4   CUST_PO
9   CUST_PO
37  CUST_PO
6   CUST_PO
5   CUST_PO
7   CUST_PO
11  CUST_PO
6609    SHIP_BL
2   SHIP_BL
5   SHIP_BL
8   SHIP_BL
3   SHIP_BL
4   SHIP_BL
6   SHIP_BL
7   SHIP_BL
9183    CUST_SHIPID
2   CUST_SHIPID
3   CUST_SHIPID
6   CUST_SHIPID

同样，到最后，我只需要知道这4列中的任何一个都有重复项，以及该重复项所在的列即可。

对于下面的那些评论，我无法共享表格数据。但是，在将列重新添加到HAVING中的选择中之后，让我们这样看：

select cust_ref as val, count(*) as cnt, 'CUST_REF' as what_column
     from sometable 
      where status != 'whateverStatus' 
        and custm_id = 1234
     group by cust_ref having count(cust_ref) > 1;

HAVING中的所有列名称都是该表中的实际列名称，what_column只是一个别名，它向我显示在其中找到重复项的列/查询。

所以说数据看起来像这样，我在前两列中用*标记了重复项。我希望它能加粗它们：

id | cust_ref | cust_po | ship_bl |cust_shipid
997| **1234** | 9656    | 5656    | 9876
998| **1234** | **6353**| 2436    | 9394
999| 4327     | **6353**| 4388    | 4353

我很确定我最终会得到：

val cnt what_column
      3 CUST_REF
1234  2 CUST_REF

希望有帮助！

Answer 1

您对看起来很简单的问题的解释非常费解，并且您还没有清楚地解释要计为“重复”的内容-您是否希望对总值超过该值的记录进行计数？一次，还是计数多次出现的值？

您通过将重复值的计数与域的计数相混淆进一步混淆了事情-它的恰好巧合，即查询输出中的第二行是2-这不是值您正在寻找的，恰好是相同的基数。

此列的值2正确

您想要后者建议。在这种情况下，由于：

select  cust_ref, count(*) as cnt, 'CUST_REF' as what_column
from sometable 
where status != 'whateverStatus' 
   and custm_id = 1234
group by cust_ref having count(cust_ref) > 1;

将为您提供前者，您只需要计算该查询输出的行数即可。您可以通过2种方式进行此操作：

SELECT COUNT(*) AS number_of_values_in_more_than_row, what_column
FROM (
   select  count(*) as cnt, 'CUST_REF' as what_column, cust_ref
   from sometable 
   where status != 'whateverStatus' 
      and custm_id = 1234
   group by cust_ref 
   having count(cust_ref) > 1
)
GROUP BY what_column

.... or ....

select  count(DISTINCT cust_ref) as cnt, 'CUST_REF' as what_column
from sometable 
where status != 'whateverStatus' 
    and custm_id = 1234
group by cust_ref 
having count(DISTINCT cust_ref) > 1;

Answer 2

您已经找出重复项。因此，如果只希望没有cnt列的列，请执行子查询：

select distinct what_column 
 from (
select  count(*) as cnt, 'CUST_REF' as what_column
from sometable 
 where status != 'whateverStatus' 
 and custm_id = 1234
group by cust_ref having count(cust_ref) > 1
union
 select  count(*) as cnt, 'CUST_PO' as what_column
 from sometable 
 where status != 'whateverStatus' 
  and custm_id = 1234
 group by cust_po having count(cust_po) > 1
union
 select count(*) as cnt, 'SHIP_BL' as what_column
from sometable 
 where status != 'whateverStatus' 
and custm_id = 1234
 group by ship_bl having count(ship_bl) > 1
union
select count(*) as cnt, 'CUST_SHIPID' as what_column
  from sometable 
where status != 'whateverStatus' 
and custm_id = 1234
group by cust_shipid having count(cust_shipid) > 1);

Answer 3

最终成功的答案是在外部查询上使用了hading子句，它返回了需要的正确数字：

SELECT sum(cnt) as dupes, COUNT(*) AS number_of_values_in_more_than_row, what_column
  FROM (
select  count(*) as cnt, 'CUST_REF' as what_column,cust_ref
 from sometable 
  where status != 'whateverStatus' 
    and custm_id = 1234
 group by cust_ref having count(cust_ref) > 1
 union
 select  count(*) as cnt, 'CUST_PO' as what_column,cust_po
 from sometable 
  where status != 'whateverStatus' 
    and custm_id = 1234
 group by cust_po having count(cust_po) > 1
  union
 select count(*) as cnt, 'SHIP_BL' as what_column,ship_bl
 from sometable 
  where status != 'whateverStatus' 
    and custm_id = 1234
 group by ship_bl having count(ship_bl) > 1
  union
 select count(*) as cnt, 'CUST_SHIPID' as what_column,cust_shipid
 from sometable 
  where status != 'whateverStatus' 
    and custm_id = 1234
 group by cust_shipid having count(cust_shipid) > 1
 )x
 GROUP BY what_column having count(number_of_values_in_more_than_row) >0;

Mysql组通过在计数值之前的列中显示总值，如何停止此操作？

3 个答案: