我有一个在特定列中有多个值的表,我正在尝试编写两个sql查询,它将识别整个重复集和整套唯一记录,我写了几个查询但是在我独特的集合中从副本集中获取其中一条记录。
示例数据,
pay_id, pay_ratio, pay_type, cor_id
123, 12, C, Annual
123, 12, C, Annual
456, 13, A, Semi-Annual
476, 43, B, Monthly
987, 32, H, Daily
987, 32, H, Daily
我正在尝试将上述数据集分开,如下所示。
唯一数据集
pay_id, pay_ratio, pay_type, cor_id
456, 13, A, Semi-Annual
476, 43, B, Monthly
重复数据集
pay_id, pay_ratio, pay_type, cor_id
123, 12, C, Annual
123, 12, C, Annual
987, 32, H, Daily
987, 32, H, Daily
有人可以建议我如何使用sql查询来实现这一点。
此致 西
答案 0 :(得分:3)
您可以使用COUNT() OVER()
:
SELECT pay_id, pay_ratio, pay_type, cor_id,
CASE
WHEN COUNT(*) OVER (PARTITION BY pay_id, pay_ratio, pay_type, cor_id) = 1
THEN 'unique'
ELSE 'dupl'
END AS type
FROM mytable
以上查询返回'unique'
表示唯一记录,'dupl'
表示重复记录。您可以将查询包装在CTE
或子查询中,并根据需要对其进行过滤。
注意:以上查询基于表格的所有4个字段确定重复记录的假设。您可以根据需要更改PARTITION BY
子句,以解决其他重复的逻辑问题。
答案 1 :(得分:0)
使用common table expression row_number()
独特性:
;with cte as (
select *
, rn = row_number() over (partition by pay_id order by pay_type)
from t
)
select *
from cte
where not exists (
select 1
from cte i
where i.pay_id = cte.pay_id
and i.rn > 1
)
rextester演示:http://rextester.com/LHIGTY81886
返回:
+--------+-----------+----------+-------------+----+
| pay_id | pay_ratio | pay_type | cor_id | rn |
+--------+-----------+----------+-------------+----+
| 456 | 13 | A | Semi-Annual | 1 |
| 476 | 43 | B | Monthly | 1 |
+--------+-----------+----------+-------------+----+
重复:
;with cte as (
select *
, rn = row_number() over (partition by pay_id order by pay_type)
from t
)
select *
from cte
where exists (
select 1
from cte i
where i.pay_id = cte.pay_id
and i.rn > 1
)
返回:
+--------+-----------+----------+--------+----+
| pay_id | pay_ratio | pay_type | cor_id | rn |
+--------+-----------+----------+--------+----+
| 123 | 12 | C | Annual | 1 |
| 123 | 12 | C | Annual | 2 |
| 987 | 32 | H | Daily | 1 |
| 987 | 32 | H | Daily | 2 |
+--------+-----------+----------+--------+----+
答案 2 :(得分:0)
分组和计数:
SELECT
pay_id, pay_ratio, pay_type, cor_id,
COUNT(*) AS [Duplicates]
FROM table
GROUP BY pay_id, pay_ratio, pay_type, cor_id
HAVING COUNT(*) = 0 /* unique; > 0 - duplicates */
ORDER BY COUNT(*) ASC