我需要根据两个表并根据自定义条件查找重复项。以下内容确定它是否重复,如果是,则仅显示最新的:
如果员工姓名和所有EmployeePolicy CoverageId(s)完全匹配另一条记录,那么这被视为重复。
--Employee Table
EmployeeId Name Salary
543 John 54000
785 Alex 63000
435 John 75000
123 Alex 88000
333 John 67000
--EmployeePolicy Table
EmployeePolicyId EmployeeId CoverageId
1 543 8888
2 543 7777
3 785 5555
4 435 8888
5 435 7777
6 123 4444
7 333 8888
8 333 7776
例如,上例中的重复项如下:
EmployeeId Name Salary
543 John 54000
435 John 75000
这是因为它们是Employee表中唯一具有匹配名称的,并且两者在EmployeePolicy表中具有相同的CoverageIds。
注意: EmployeeId 333也与Name = John不匹配,因为他的两个CoverageID与其他John的CoverageId不同。
起初,我一直试图通过分组记录并说计数(*)>来找到重复的旧式方法。 1,但后来很快意识到它不会起作用,因为在英语中我的标准定义了重复,在SQL中CoverageID是不同的,因此它们不被视为重复。
通过同样的协议,我尝试了类似的事情:
-- Create a TMP table
INSERT INTO #tmp
SELECT *
FROM Employee e join EmployeePolicy ep on e.EmpoyeeId = ep.EmployeeId
SELECT info.*
FROM
(
SELECT
tmp.*,
ROW_NUMBER() OVER(PARTITION BY tmp.Name, tmp.CoverageId ORDER BY tmp.EmployeeId DESC) AS RowNum
FROM #tmp tmp
) info
WHERE
info.RowNum = 1 AND
同样,这不起作用,因为SQL不会将其视为重复项。不确定如何将我的重复英文定义翻译成重复的SQL定义。
非常感谢任何帮助。
答案 0 :(得分:3)
最简单的方法是将策略连接成一个字符串。唉,这在SQL Server中很麻烦。这是一种基于集合的方法:
with ep as (
select ep.*, count(*) over (partition by employeeid) as cnt
from employeepolicy ep
)
select ep.employeeid, ep2.employeeid
from ep join
ep ep2
on ep.employeeid < ep2.employeeid and
ep.CoverageId = ep2.CoverageId and
ep.cnt = ep2.cnt
group by ep.employeeid, ep2.employeeid, ep.cnt
having count(*) = cnt -- all match
我们的想法是匹配不同员工的承保范围。一个简单的标准是覆盖范围需要匹配。然后,它检查匹配的覆盖数是否是实际数量。
注意:这会将员工ID对放在一行中。您可以加入employees表以获取其他信息。
答案 1 :(得分:0)
我没有测试过T-SQL,但我相信以下内容可以为您提供所需的输出。
;WITH CTE_Employee
AS
(
SELECT E.[Name]
,E.[EmployeeId]
,P.[CoverageId]
,E.[Salary]
FROM Employee E
INNER JOIN EmployeePolicy P ON E.EmployeeId = P.EmployeeId
)
, CTE_DuplicateCoverage
AS
(
SELECT E.[Name]
,E.[CoverageId]
FROM CTE_Employee E
GROUP BY E.[Name], E.[CoverageId]
HAVING COUNT(*) > 1
)
SELECT E.[EmployeeId]
,E.[Name]
,MAX(E.[Salary]) AS [Salary]
FROM CTE_Employee E
INNER JOIN CTE_DuplicateCoverage D ON E.[Name] = D.[Name] AND E.[CoverageId] = D.[CoverageId]
GROUP BY E.[EmployeeId], E.[Name]
HAVING COUNT(*) > 1
ORDER BY E.[EmployeeId]