如何使用sql语句识别唯一和重复的记录

时间:2017-05-02 16:32:50

标签: sql sql-server

我有一个在特定列中有多个值的表,我正在尝试编写两个sql查询,它将识别整个重复集和整套唯一记录,我写了几个查询但是在我独特的集合中从副本集中获取其中一条记录。

示例数据,

pay_id, pay_ratio, pay_type, cor_id
123,    12,        C,        Annual
123,    12,        C,        Annual
456,    13,        A,        Semi-Annual
476,    43,        B,        Monthly
987,    32,        H,        Daily
987,    32,        H,        Daily

我正在尝试将上述数据集分开,如下所示。

唯一数据集

pay_id, pay_ratio, pay_type, cor_id
456,    13,        A,        Semi-Annual
476,    43,        B,        Monthly

重复数据集

pay_id, pay_ratio, pay_type, cor_id
123,    12,        C,        Annual
123,    12,        C,        Annual
987,    32,        H,        Daily
987,    32,        H,        Daily

有人可以建议我如何使用sql查询来实现这一点。

此致 西

3 个答案:

答案 0 :(得分:3)

您可以使用COUNT() OVER()

执行此操作
SELECT pay_id, pay_ratio, pay_type, cor_id,
       CASE 
          WHEN COUNT(*) OVER (PARTITION BY pay_id, pay_ratio, pay_type, cor_id) = 1 
             THEN 'unique'
          ELSE 'dupl'
       END AS type
FROM mytable

以上查询返回'unique'表示唯一记录,'dupl'表示重复记录。您可以将查询包装在CTE或子查询中,并根据需要对其进行过滤。

注意:以上查询基于表格的所有4个字段确定重复记录的假设。您可以根据需要更改PARTITION BY子句,以解决其他重复的逻辑问题。

答案 1 :(得分:0)

使用common table expression row_number()

独特性:

;with cte as (
  select *
    , rn = row_number() over (partition by pay_id order by pay_type)
  from t
)
select *
from cte
where not exists (
  select 1
  from cte i
  where i.pay_id = cte.pay_id
    and i.rn > 1
  )

rextester演示:http://rextester.com/LHIGTY81886

返回:

+--------+-----------+----------+-------------+----+
| pay_id | pay_ratio | pay_type |   cor_id    | rn |
+--------+-----------+----------+-------------+----+
|    456 |        13 | A        | Semi-Annual |  1 |
|    476 |        43 | B        | Monthly     |  1 |
+--------+-----------+----------+-------------+----+

重复:

;with cte as (
  select *
    , rn = row_number() over (partition by pay_id order by pay_type)
  from t
)
select *
from cte
where exists (
  select 1
  from cte i
  where i.pay_id = cte.pay_id
    and i.rn > 1
  )

返回:

+--------+-----------+----------+--------+----+
| pay_id | pay_ratio | pay_type | cor_id | rn |
+--------+-----------+----------+--------+----+
|    123 |        12 | C        | Annual |  1 |
|    123 |        12 | C        | Annual |  2 |
|    987 |        32 | H        | Daily  |  1 |
|    987 |        32 | H        | Daily  |  2 |
+--------+-----------+----------+--------+----+

答案 2 :(得分:0)

分组和计数:

SELECT
  pay_id, pay_ratio, pay_type, cor_id,
  COUNT(*) AS [Duplicates]
FROM table
GROUP BY pay_id, pay_ratio, pay_type, cor_id
HAVING COUNT(*) = 0 /* unique; > 0 - duplicates */
ORDER BY COUNT(*) ASC