我有一份报告,根据我们的业务规则显示重复帐户的列表。当一个新帐户与其他现有帐户匹配时,此方法有效。我遇到麻烦的地方是多个新帐户匹配相同的现有副本。 这是一个现在看起来如何的例子,按NewId分组:
NewID MatchedID FirstName LastName AddDate Address PhoneNumber
10 10 Holly Johnson 4/18/2013 123 1St Rd. 123 456 7890
10 2 Hollie Johnson 1/1/1990 123 1St Rd. 123 456 7890
11 11 Holley Johnson 4/17/2013 123 1St Rd. 123-456-7890
11 2 Hollie Johnson 1/1/1990 123 First Rd. 123 456 7890
50 50 William Johnson 4/17/2013 999 2nd St. 222 222 2222
50 3 Bill Jonson 1/2/1990 999 Second St. 222-222-2222
具有匹配项的帐户本身也包含在内以供比较。
那么,有没有办法将这些类似的帐户组合在一起而不重复?它应该是这样的:
GroupID AcctID FirstName LastName AddDate Address PhoneNumber
1 2 Hollie Johnson 1/1/1990 123 First Rd. 123 456 7890
1 10 Holly Johnson 4/18/2013 123 1St Rd. 123 456 7890
1 11 Holley Johnson 4/17/2013 123 1St Rd. 123-456-7890
2 50 William Johnson 4/17/2013 999 2nd St. 222 222 2222
2 3 Bill Jonson 1/2/1990 999 Second St. 222-222-2222
我不关心分组是在SQL中还是在SSRS中完成的。它需要引用两个ID列,因为名称,地址和电话号码可能不同。我还需要分配一个新的GroupID,以便将它们分组到报告中。
答案 0 :(得分:1)
您可以使用排名功能来消除行:
with NoDuplicates as
(
select *
, rownum = row_number() over (partition by MatchedID order by NewID)
from Accounts
)
select NewID
, MatchedID
, Name
, AddDate
, Address
, phoneNumber
from NoDuplicates where rownum = 1
虽然没有理由你不能只使用GROUP BY
假设地址信息也总是重复:
select NewID = min(NewID)
, MatchedID
, Name
, AddDate
, Address
, phoneNumber
from Accounts
group by MatchedID
, Name
, AddDate
, Address
, phoneNumber
这两个都会返回您的预期结果。
评论后修改:
您可以使用如下语句对相关行进行分组:
with NoDuplicates as
(
select *
, rownum = row_number() over (partition by MatchedID order by NewID)
from Accounts
where NewID <> MatchedID
)
select groupID = MatchedID
, Acct = MatchedID
, FirstName
, AddDate
, Address
, phoneNumber
from NoDuplicates where rownum = 1
union all
select groupID = coalesce(am.MatchedID, a.NewID)
, Acct = a.MatchedID
, a.FirstName
, a.AddDate
, a.Address
, a.phoneNumber
from Accounts a
-- join to the corresponding matched account
left join Accounts am on a.MatchedID = am.NewID and am.NewID <> am.MatchedID
where a.NewID = a.MatchedID
order by groupID, Acct
但是,这基本上只是按MatchedID
分组。如果您希望编号组从1开始,则可以在语句中添加DENSE_RANK
子句:
with NoDuplicates as
(
select *
, rownum = row_number() over (partition by MatchedID order by NewID)
from Accounts
where NewID <> MatchedID
)
, GroupedAcct as
(
select GroupID = MatchedID
, Acct = MatchedID
, FirstName
, AddDate
, Address
, phoneNumber
from NoDuplicates where rownum = 1
union all
select GroupID = coalesce(am.MatchedID, a.NewID)
, Acct = a.MatchedID
, a.FirstName
, a.AddDate
, a.Address
, a.phoneNumber
from Accounts a
-- join to the corresponding matched account
left join Accounts am on a.MatchedID = am.NewID and am.NewID <> am.MatchedID
where a.NewID = a.MatchedID
)
select GroupID = Dense_Rank() over (order by GroupID)
, Acct
, FirstName
, AddDate
, Address
, phoneNumber
from GroupedAcct
order by groupID, Acct