使用或条件将多个行组合在一起

时间:2018-04-17 19:30:20

标签: sql sql-server group-by duplicates dense-rank

我有一张人员表,我需要通过多种可能的方案找到重复记录。例如,如果fname,lname和address相同,或者fname,lname,dob相同,或者fname,lname和Email相同,则组合在一起。我无法想出一种在SQL中执行此操作的方法。我仅将上述示例用作示例,因为分组标准最终会更加严格。我在SQL Fiddle中设置了一个包含数据的示例。我想要的结果将记录2-5组合在一起,1和6将是唯一的行。

CREATE TABLE Persons (
    ID int IDENTITY(1,1),
    FirstName varchar(255),  
    LastName varchar(255),    
    Address1 varchar(255),
    City varchar(255),
    State varchar(255),
    BDay Varchar(255),
    Email Varchar(255)
);


INSERT INTO Persons
SELECT 'RICK', 'ALLEN', '44 Street', 'Minneapolis', 'MN', '1/2/1970','help@test.com'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '123 Street', 'Minneapolis', 'MN', '4/8/1980','test@test.com'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '123 Street', 'Minneapolis', 'MN', '4/8/1981','test@test.com'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '42 Street', 'Minneapolis', 'MN', '4/8/1980','test@test.com'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '123 Street', 'Minneapolis', 'MN', '4/8/1980','test2@test.com'
UNION ALL
SELECT 'STEVEN', 'ALLEN', '555 Street', 'Minneapolis', 'MN', '2/8/1980','help@test.com'

1 个答案:

答案 0 :(得分:3)

您可以使用not exists子句:

select  p1.*
from    Persons p1
where   not exists
        (
        select  *
        from    Persons p2
        where   p1.id < p2.id and
                p1.FirstName = p2.FirstName and
                p1.LastName = p2.LastName and
                (
                    p1.Address1 = p2.Address1 or
                    p1.BDay = p2.BDay or
                    p1.Email = p2.Email
                )
        )

Working example on SQL Fiddle.

回复您的评论,您可以使用更新查询在表格中标记重复项:

with    dupe as
        (
        select  min(p1.ID) as OriginalID
        ,       p2.ID as DupeID
        from    Persons p1
        join    Persons p2
        on      p1.id < p2.id and
                p1.FirstName = p2.FirstName and
                p1.LastName = p2.LastName and
                (
                    p1.Address1 = p2.Address1 or
                    p1.BDay = p2.BDay or
                    p1.Email = p2.Email
                )
        group by
                p2.ID
        )
update  p1
set     DupeOfID = dupe.OriginalID
from    Persons p1
join    dupe
on      dupe.DupeID = p1.ID

Working example on SQL Fiddle.