查找重复的行,但仅针对唯一列

时间:2019-11-25 19:12:12

标签: sql oracle oracle12c

使用Oracle 12c。我试图识别具有唯一 ref1_descr 字段的重复行。该计数应分组在前三列( emplid item_type acad_year )上,并且应仅计数 ref1_descr 一次。

例如,由于该结果属于同一 ref1_descr

+-------------+--------------+-----------+------------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000010315 | 103201000000 |      2020 |    1938427 |
| 00000010315 | 103201000000 |      2020 |    1938427 |
+-------------+--------------+-----------+------------+

应注意这一点,因为对于唯一的 ref1_descr 存在重复项。

+-------------+--------------+-----------+------------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000592537 | 104110123000 |      2020 |    1941668 |
| 00000592537 | 104110123000 |      2020 |    1941164 |
+-------------+--------------+-----------+------------+

这将获得两个示例,但是我需要它忽略第一个示例,因为这些行共享一个 ref1_descr

SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1

编辑

应用程序学-我应该在原始问题中包括预期的输出。

  

我认为您想要Have子句中的额外条件:

SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1 AND
       MIN(REF1_DESCR) <> MAX(REF1_DESCR);
+-------------+--------------+-----------+------------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000027710 | 104300113000 |      2020 |    1956315 |
| 00000027710 | 104300113000 |      2020 |    1946006 |
| 00000027710 | 104300113000 |      2020 |    1946006 |
| 00000027710 | 104300113000 |      2020 |    1946006 |
+-------------+--------------+-----------+------------+

结果:

+-------------+--------------+-----------+----------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | COUNT(*) |
+-------------+--------------+-----------+----------+
| 00000027710 | 104300113000 |      2020 |        4 |
+-------------+--------------+-----------+----------+

我期望它返回2。

3 个答案:

答案 0 :(得分:1)

我认为您想要D子句中的额外条件:

having

实际上,如果描述不同,那么至少有两行,因此您可以删除COUNT(*)条件:

SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1 AND
       MIN(REF1_DESCR) <> MAX(REF1_DESCR);

编辑:

HAVING MIN(REF1_DESCR) <> MAX(REF1_DESCR);

这似乎是最简单的解决方案。

答案 1 :(得分:1)

关于DISTINCT吗?参见第10行:

SQL> with test (emplid, item_type, acad_year, ref1_descr) as
  2    (select 27710, 104300113000 , 2020, 1956315 from dual union all
  3     select 27710, 104300113000 , 2020, 1946006 from dual union all
  4     select 27710, 104300113000 , 2020, 1946006 from dual union all
  5     select 27710, 104300113000 , 2020, 1946006 from dual
  6    )
  7  select emplid,
  8         item_Type,
  9         acad_year,
 10         count(distinct ref1_descr) cnt      --> DISTINCT here?
 11  from test
 12  group by emplid, item_type, acad_year
 13  having count(*) > 1
 14    and min(ref1_descr) <> max(ref1_descr);

    EMPLID      ITEM_TYPE  ACAD_YEAR        CNT
---------- -------------- ---------- ----------
     27710   104300113000       2020          2

SQL>

答案 2 :(得分:1)

一种选择是将count()分析函数与distinct ref1_descr按其余三列进行分区:

with t as
(
select count(distinct ref1_descr) over (partition by emplid,  item_Type, acad_year) as cnt,
       t.*
  from tab t
)  
select emplid, item_type, acad_year, ref1_descr
  from t
 where cnt > 1 

仅返回这两行

Demo