用于合并客户帐户的SQL脚本

时间:2012-07-24 14:00:13

标签: sql-server

我想出了一个脚本来选择“Master”帐户和“Slave”帐户。公司名称和邮政编码完全匹配的地方。它认为最近更新的帐户是主人。

select
    m.ev870_acct_code, m.ev870_company_name, m.ev870_postal_code, m.ev870_iacvb_code,
    s.ev870_acct_code, s.ev870_company_name, s.ev870_postal_code, s.ev870_iacvb_code
from
    ev870_acct_master m
inner join
    ev870_acct_master s
on
    m.ev870_company_name = s.ev870_company_name
and m.ev870_postal_code = s.ev870_postal_code
and m.ev870_upd_stamp > s.ev870_upd_stamp
where
    m.ev870_class = 'o'
and s.ev870_class = 'o'
and m.ev870_status != '0'
and s.ev870_status != '0'
and (m.ev870_iacvb_code = s.ev870_iacvb_code or isnull(m.ev870_iacvb_code,'') = '' or isnull(s.ev870_iacvb_code,'') = '')
and s.ev870_company_name like '%council%'
order by
    m.ev870_upd_stamp desc

脚本的问题在于它可能确定:

  • 帐户1是主帐户,存在重复的从帐户2。
  • 帐户1是主帐户,存在重复的从帐户3。
  • 帐户2是主帐户,存在重复的从帐户3。

如您所见,每个步骤的结果都会影响以下步骤。你能推荐更智能的查询吗?

编辑解决方案:

select
    m.ev870_acct_code, m.ev870_company_name, m.ev870_postal_code, m.ev870_iacvb_code,
    s.ev870_acct_code, s.ev870_company_name, s.ev870_postal_code, s.ev870_iacvb_code
from
    ev870_acct_master s
inner join 
    (
    select 
        ev870_acct_code, ev870_company_name, ev870_postal_code, ev870_iacvb_code, ev870_upd_stamp
        ,row_number() over (partition by ev870_company_name, ev870_postal_code, ev870_iacvb_code order by ev870_upd_stamp desc) as howRecent
    from 
        ev870_acct_master
    where
        ev870_class = 'o'
    and ev870_status != '0'
    and ev870_postal_code != ''
    and ev870_company_name like 'A%'
    ) m 
on  
    m.ev870_company_name = s.ev870_company_name
and m.ev870_postal_code = s.ev870_postal_code
and m.ev870_upd_stamp > s.ev870_upd_stamp
where
    m.howRecent = 1
and m.ev870_iacvb_code = s.ev870_iacvb_code
and s.ev870_class = 'o'
and s.ev870_status != '0'

1 个答案:

答案 0 :(得分:0)

跟进你的评论:

  

@kristof我在我的应用程序中识别重复的帐户   然后将合并为一个帐户。

您可以使用与此类似的代码:

declare @dupExample table (
    id int identity(1,1)
    ,name varchar(50)
    ,postal varchar(50)
    ,lastUpdated datetime
)

insert into @dupExample(name, postal, lastUpdated)
values 
    ('a','pc1','20120101')
    ,('a','pc1','20120501')
    ,('a','pc1','20120601')
    ,('a','pc1','20120701')
    ,('a','pc1','20120201')
    ,('b','pc2','20120102')
    ,('b','pc2','20120202')
    ,('b','pc2','20120302')
    ,('b','pc2','20120302')
    ,('c','pc2','20120302')
    ,('d','pc2','20120302')
    ,('d','pc2','20120302')


select * from @dupExample 


/*
    to see duplicates along with how recent they are
*/
select 
    *
    ,row_number() over (partition by name, postal order by lastUpdated) as howRecent
from 
    @dupExample     

/*
    delete duplicates leaving only most recent record based on date updated
    WARNING only one record will be left for each dup even if there are multiple records 
    updated on the same date (see b, d examples)
*/
delete de
from    @dupExample de
    inner join 
    ( select 
        id
        ,row_number() over (partition by name, postal order by lastUpdated desc) as howRecent
        from 
            @dupExample     
    ) der on de.id = der.id
where
    der.howRecent > 1   

/*
    after delete
*/      
select * from @dupExample   

如果你有相同dateUpdated的重复项,你可以在命令部分添加一些额外的标准,以指定在这种情况下要删除哪条记录 - 但希望它能给你一个很好的起点。