使用Oracle 12c。我试图识别具有唯一 ref1_descr 字段的重复行。该计数应分组在前三列( emplid , item_type 和 acad_year )上,并且应仅计数 ref1_descr 一次。
例如,由于该结果属于同一 ref1_descr 。
+-------------+--------------+-----------+------------+
| EMPLID | ITEM_TYPE | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000010315 | 103201000000 | 2020 | 1938427 |
| 00000010315 | 103201000000 | 2020 | 1938427 |
+-------------+--------------+-----------+------------+
应注意这一点,因为对于唯一的 ref1_descr 存在重复项。
+-------------+--------------+-----------+------------+
| EMPLID | ITEM_TYPE | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000592537 | 104110123000 | 2020 | 1941668 |
| 00000592537 | 104110123000 | 2020 | 1941164 |
+-------------+--------------+-----------+------------+
这将获得两个示例,但是我需要它忽略第一个示例,因为这些行共享一个 ref1_descr 。
SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1
编辑
应用程序学-我应该在原始问题中包括预期的输出。
我认为您想要Have子句中的额外条件:
SELECT emplid, item_type, acad_year, COUNT(*) FROM ps_item_sf GROUP BY emplid, item_type, acad_year HAVING COUNT(*) > 1 AND MIN(REF1_DESCR) <> MAX(REF1_DESCR);
+-------------+--------------+-----------+------------+
| EMPLID | ITEM_TYPE | ACAD_YEAR | REF1_DESCR |
+-------------+--------------+-----------+------------+
| 00000027710 | 104300113000 | 2020 | 1956315 |
| 00000027710 | 104300113000 | 2020 | 1946006 |
| 00000027710 | 104300113000 | 2020 | 1946006 |
| 00000027710 | 104300113000 | 2020 | 1946006 |
+-------------+--------------+-----------+------------+
结果:
+-------------+--------------+-----------+----------+
| EMPLID | ITEM_TYPE | ACAD_YEAR | COUNT(*) |
+-------------+--------------+-----------+----------+
| 00000027710 | 104300113000 | 2020 | 4 |
+-------------+--------------+-----------+----------+
我期望它返回2。
答案 0 :(得分:1)
我认为您想要D
子句中的额外条件:
having
实际上,如果描述不同,那么至少有两行,因此您可以删除COUNT(*)条件:
SELECT emplid, item_type, acad_year, COUNT(*)
FROM ps_item_sf
GROUP BY emplid, item_type, acad_year
HAVING COUNT(*) > 1 AND
MIN(REF1_DESCR) <> MAX(REF1_DESCR);
编辑:
HAVING MIN(REF1_DESCR) <> MAX(REF1_DESCR);
这似乎是最简单的解决方案。
答案 1 :(得分:1)
关于DISTINCT
吗?参见第10行:
SQL> with test (emplid, item_type, acad_year, ref1_descr) as
2 (select 27710, 104300113000 , 2020, 1956315 from dual union all
3 select 27710, 104300113000 , 2020, 1946006 from dual union all
4 select 27710, 104300113000 , 2020, 1946006 from dual union all
5 select 27710, 104300113000 , 2020, 1946006 from dual
6 )
7 select emplid,
8 item_Type,
9 acad_year,
10 count(distinct ref1_descr) cnt --> DISTINCT here?
11 from test
12 group by emplid, item_type, acad_year
13 having count(*) > 1
14 and min(ref1_descr) <> max(ref1_descr);
EMPLID ITEM_TYPE ACAD_YEAR CNT
---------- -------------- ---------- ----------
27710 104300113000 2020 2
SQL>
答案 2 :(得分:1)
一种选择是将count()
分析函数与distinct ref1_descr
按其余三列进行分区:
with t as
(
select count(distinct ref1_descr) over (partition by emplid, item_Type, acad_year) as cnt,
t.*
from tab t
)
select emplid, item_type, acad_year, ref1_descr
from t
where cnt > 1
仅返回这两行