在表格中标记重复记录

时间:2018-05-24 11:56:00

标签: sql sql-server tsql

我正在尝试标记重复记录,但是我对其中的一些记录进行了错误的重新分配,我不知道为什么。

数据:

= FirstName | LastName | Company | Group | Status | ID

      x    |    x     |     x   | NULL  | NULL   | 1
      x    |    x     |     x   | NULL  | NULL   | 2

然后我运行此查询以查找FirstName,LastName,Company上的匹配项 并将其连接回主表以标记记录:

    with d as (
select ID, FirstName, LAstName, Company, row_number() over (partition by FirstName,LastName, Company order by FirstName,LastName, Company) as nr
from [dbo].xx) 

Update b
set Status = 'S'
, Group = d.DQ_ID
from xx as b inner join d on
b.FirstName = d.FirstName and 
b.LastNAme = d.LastName and
b.Company = d.Company
where d.nr = 1 

然后用P

更新主记录
Update b
set Status = 'P'
from xx as b
where b.ID = b.Group
GO

我的期望:

= FirstName | LastName | Company | Group | Status | ID

      x    |    x     |     x   | 1     |  P     | 1
      x    |    x     |     x   | 1     |  S     | 2

我得到了什么:

= FirstName | LastName | Company | Group | Status | ID

      x    |    x     |     x   | 2     |  S     | 1
      x    |    x     |     x   | 1     |  S     | 2

我正在制作大约1M的记录 - 而且只发生在其中一些记录上!

2 个答案:

答案 0 :(得分:1)

试试这个:

;with d as (

select 
ID, 
FirstName, 
LAstName, 
Company, 
row_number() over (
    partition by FirstName,LastName, Company 
    order by Id asc -- this was done to keep ordering as per ID
    ) as nr
from [dbo].xx
) ,
e as 
(select * from d where nr=1)
-- e was created to only take the nr=1 rows which will be joined to all similar records
Update b
set Status = case when e.DQ_ID = b.DQ_ID  then 'P' else 'S' end
 -- the set case logic ensures that matching ids get P else S
, Group = e.DQ_ID
from xx as b 
    inner join e on
        b.FirstName = e.FirstName and 
        b.LastNAme = e.LastName and
        b.Company = e.Company

答案 1 :(得分:1)

可以尝试使用以下内容:

;WITH RankedData AS
(
    SELECT
        T.ID,
        T.[Group],
        T.Status,
        T.FirstName,
        T.LastName,
        T.Company,
        GroupRanking = ROW_NUMBER() OVER (PARTITION BY T.FirstName, T.LastName, T.Company ORDER BY T.ID ASC)
    FROM
        dbo.xx AS T
)
UPDATE T SET
    [Group] = N.ID,
    Status = CASE WHEN T.GroupRanking = 1 THEN 'P' ELSE 'S' END
FROM
    RankedData AS T
    INNER JOIN RankedData AS N ON
        T.FirstName = N.FirstName AND
        T.LastName = N.LastName AND
        T.Company = N.Company AND
        N.GroupRanking = 1

请记住,INNER JOIN将加入非空名称和公司,如果这些列上有空值,则必须记住。