我尝试从Person
表中列出并分组一些可能的重复项。
架构如下所示:
Id LastName OriginalName FirstName
---------------------------------------------
1 Nolte Huber Silvia
2 Nolte Johann
3 Huber Milan
4 Huber Silvia
5 Abacherli Adrian
6 Abächerli Adrian
7 Meier Hans
8 Meier Urs
9 Meyer Hans
10 Meier Urs
11 Hermann Marco
12 Huber Milan
13 Meyer Hans
预期结果:
GroupNumber Id LastName OriginalName FirstName
-----------------------------------------------------------
1 5 Abacherli Adrian
1 6 Abächerli Adrian
2 3 Huber Milan
2 12 Huber Milan
3 4 Huber Silvia
3 1 Nolte Huber Silvia
4 7 Meier Hans
4 9 Meyer Hans
4 13 Meyer Hans
5 8 Meier Urs
5 10 Meier Urs
说明:
我想对匹配的行进行分组,并将它们列在Web应用程序的网格中(ASP.NET MVC)。考虑重复的内容必须至少包含:
LastName
且相同FirstName
或LastName
,例如OrginalName
和FirstName
为了使事情更复杂,“相同”意味着语音匹配(即通过SOUNDEX
或类似功能):Meyer
== Meier
== meier
。
使用中的技术:
预期答案:
到目前为止,我已经制定了所有方法,但错过了GroupNumber
。这是一个(非工作)查询:
SELECT
Id, LastName, FirstName
FROM
Person p1,
(SELECT
p1.Id AS Id1
FROM Person p1
INNER JOIN Person p2
ON (p1.LastName LIKE p2.LastName OR p1.LastName LIKE p2.OriginalName) AND p1.FirstName LIKE p2.FirstName AND p1.Id <> p2.Id
GROUP BY p1.Id
HAVING COUNT(*) > 1) AS p2
WHERE
p1.Id IN (SELECT Id1)
ORDER BY
p1.LastName, FirstName, Id
答案 0 :(得分:1)
这个怎么样:
MS SQL Server 2012架构设置:
CREATE TABLE Person
( ID Int,
LastName Varchar(50),
OriginalName Varchar(50),
FirstName varchar(50)
)
INSERT INTO Person
VALUES
(1, 'Nolte', 'Huber','Silvia'),
(2,'Nolte', '', 'Johann'),
(3,'Huber', '', 'Milan'),
(4,'Huber', '', 'Silvia'),
(5,'Abacherli', '', 'Adrian'),
(6,'Abacherli', '', 'Adrian'),
(7,'Meier', '', 'Hans'),
(8,'Meier', '', 'Urs'),
(9,'Meyer', '', 'Hans'),
(10,'Meier', '', 'Urs'),
(11,'Hermann', '', 'Marco'),
(12,'Huber', '', 'Milan'),
(13,'Meyer', '', 'Hans')
查询1 :
;WITH PersonCTE
AS
(
SELECT ID, SOUNDEX(LastName) AS LastNameSDX, LastName, OriginalName, SOUNDEX(FirstName) FirstNameSDX, FirstName
FROM Person
UNION ALL
SELECT ID, SOUNDEX(OriginalName) AS LastNameSDX, LastName, OriginalName, SOUNDEX(FirstName) FirstNameSDX, FirstName
FROM Person
WHERE OriginalName <> ''
),
PersonRankCTE
AS
(
SELECT DENSE_RANK() OVER (ORDER BY LastNameSDX, FirstNameSdx) AS Grp, *
FROM PersonCTE
)
SELECT DENSE_RANK() OVER(ORDER BY grp) AS Grp, ID, LastName, OriginalName, FirstName
FROM PersonRankCTE P1
WHERE (SELECT COUNT(*) FROM PersonRankCTE P2 WHERE P1.grp = P2.grp) > 1
<强> Results 强>:
| GRP | ID | LASTNAME | ORIGINALNAME | FIRSTNAME |
|-----|----|-----------|--------------|-----------|
| 1 | 5 | Abacherli | | Adrian |
| 1 | 6 | Abacherli | | Adrian |
| 2 | 3 | Huber | | Milan |
| 2 | 12 | Huber | | Milan |
| 3 | 1 | Nolte | Huber | Silvia |
| 3 | 4 | Huber | | Silvia |
| 4 | 13 | Meyer | | Hans |
| 4 | 9 | Meyer | | Hans |
| 4 | 7 | Meier | | Hans |
| 5 | 8 | Meier | | Urs |
| 5 | 10 | Meier | | Urs |
答案 1 :(得分:0)
也许(可能?)过于复杂,但是......
我制作了两个CTE
1获取具有相应Soundex LastName和OriginalName的所有Person字段
1创建组并获取GroupNumber。在1&#34;列&#34;上创建一个联盟全部,&#34; soundexed&#34; LastName和OriginalName(仅采用重复项)
所以
with cte as (select
id,
LastName,
OriginalName,
soundex(LastName) as sdxLastName,
soundex(OriginalName) as sdxOriginalName,
FirstName
from Person),
grp as (select lname, FirstName, row_number() over(order by lname) rn
from (
select
sdxOriginalName as lname,
FirstName from cte
where sdxOriginalName is not null
union all
select
sdxLastName as lname,
FirstName from cte) s
group by lname, FirstName
having count(*) > 1)
select
g.rn as GroupNumber,
p.Id,
p.LastName,
p.OriginalName,
p.FirstName
from grp g
join cte p on p.firstName = g.FirstName and
(sdxLastName = g.lname or sdxOriginalName = g.lname)
order by rn
请参阅Sqlfiddle