Question

使用Oracle 12c。我试图识别具有唯一 ref1_descr 字段的重复行。该计数应分组在前三列（ emplid ， item_type 和 acad_year ）上，并且应仅计数 ref1_descr 一次。

例如，由于该结果属于同一 ref1_descr 。

+-------------+--------------+-----------+------------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000010315 | 103201000000 |      2020 |    1938427 |
| 00000010315 | 103201000000 |      2020 |    1938427 |
+-------------+--------------+-----------+------------+

应注意这一点，因为对于唯一的 ref1_descr 存在重复项。

+-------------+--------------+-----------+------------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000592537 | 104110123000 |      2020 |    1941668 |
| 00000592537 | 104110123000 |      2020 |    1941164 |
+-------------+--------------+-----------+------------+

这将获得两个示例，但是我需要它忽略第一个示例，因为这些行共享一个 ref1_descr 。

SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1

编辑

应用程序学-我应该在原始问题中包括预期的输出。

我认为您想要Have子句中的额外条件：

SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1 AND
       MIN(REF1_DESCR) <> MAX(REF1_DESCR);

+-------------+--------------+-----------+------------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000027710 | 104300113000 |      2020 |    1956315 |
| 00000027710 | 104300113000 |      2020 |    1946006 |
| 00000027710 | 104300113000 |      2020 |    1946006 |
| 00000027710 | 104300113000 |      2020 |    1946006 |
+-------------+--------------+-----------+------------+

结果：

+-------------+--------------+-----------+----------+
|   EMPLID    |  ITEM_TYPE   | ACAD_YEAR | COUNT(*) |
+-------------+--------------+-----------+----------+
| 00000027710 | 104300113000 |      2020 |        4 |
+-------------+--------------+-----------+----------+

我期望它返回2。

Answer 1

我认为您想要D子句中的额外条件：

having

实际上，如果描述不同，那么至少有两行，因此您可以删除COUNT（*）条件：

SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1 AND
       MIN(REF1_DESCR) <> MAX(REF1_DESCR);

编辑：

HAVING MIN(REF1_DESCR) <> MAX(REF1_DESCR);

这似乎是最简单的解决方案。

Answer 2

关于DISTINCT吗？参见第10行：

SQL> with test (emplid, item_type, acad_year, ref1_descr) as
  2    (select 27710, 104300113000 , 2020, 1956315 from dual union all
  3     select 27710, 104300113000 , 2020, 1946006 from dual union all
  4     select 27710, 104300113000 , 2020, 1946006 from dual union all
  5     select 27710, 104300113000 , 2020, 1946006 from dual
  6    )
  7  select emplid,
  8         item_Type,
  9         acad_year,
 10         count(distinct ref1_descr) cnt      --> DISTINCT here?
 11  from test
 12  group by emplid, item_type, acad_year
 13  having count(*) > 1
 14    and min(ref1_descr) <> max(ref1_descr);

    EMPLID      ITEM_TYPE  ACAD_YEAR        CNT
---------- -------------- ---------- ----------
     27710   104300113000       2020          2

SQL>

Answer 3

一种选择是将count()分析函数与distinct ref1_descr按其余三列进行分区：

with t as
(
select count(distinct ref1_descr) over (partition by emplid,  item_Type, acad_year) as cnt,
       t.*
  from tab t
)  
select emplid, item_type, acad_year, ref1_descr
  from t
 where cnt > 1

仅返回这两行

Demo

查找重复的行，但仅针对唯一列

3 个答案: