删除冗余行并更新外部引用

时间:2014-11-19 09:47:07

标签: sql sql-server

我有一个SQL - 名为tblEmployees的表格,如:

sid primaryId   secondaryId employeeId  employeeName       timestamp
1   123         40          1           Eastwood, Clint    20141016124013
2   123         40          1           Eastwood, Clint    20141016130043
3   123         40          1           Westwood, Clint    20141016165733
4   123         40          1           Westwood, Clint    20141016210205

我有一个名为tblEmployeeData的表格,其中employeeIdentificataion列引用了sid的{​​{1}}列。

tblEmployees看起来像

tblEmployeeData

现在我需要删除sid employeeIdentificataion data 86 4 [binary data] 89 2 [binary data] 90 1 [binary data] 104 3 [binary data] 中的冗余行,并使用tblEmployees中最年轻的条目更新tblEmployeeData中的引用。为了识别最年轻的我可以使用时间戳。要识别重复项,我必须使用列tblEmployeesprimaryIdsecondaryIdemployeeId

这个问题的背景是,在我们的应用程序中,每次添加员工数据时,都会向tblEmployees添加新记录。我们需要知道employeeName是否发生了变化。遗憾的是,在插入新记录之前,我们无法检查更改的名称。

我可以在c#中使用SQL-Connection执行此操作。不幸的是,我必须在sql中执行此操作,因为性能方面。

任何人都可以给我一些提示或帮助如何重新解决这个问题吗?

2 个答案:

答案 0 :(得分:1)

这可以完成你的任务:

;WITH cte_todelete
     AS (SELECT *,
                ROW_NUMBER()
                  OVER (
                    partition BY primaryId, secondaryId, employeeId, employeeName
                    ORDER BY sid) AS rn
         FROM   tblEmployees)
DELETE FROM cte_todelete
WHERE  rn > 1 

ROW_NUMBER()会将增量编号(从1开始)分配给primaryId, secondaryId, employeeId, employeeName的每个组,其升序为sid。并且delete from CTE将删除除第一组之外的行。

注意:将ORDER BY sid替换为您希望的标准,即哪一行应保留在表格中order by timestamp descorder by timestamp

编辑: 立即运行此脚本以更改引用并删除冗余记录:

IF( Object_id('tempdb..#temptable') IS NOT NULL )
    DROP TABLE #temptable;

SELECT *,
       ROW_NUMBER()
         OVER (
           partition BY primaryId, secondaryId, employeeId, employeeName
           ORDER BY timstamp DESC) AS rn
INTO   #temptable
FROM   tblEmployees

UPDATE r
SET    r.employeeIdentificataion = t2.sid
FROM   tblEmployeeData r
       JOIN #temptable t1
         ON r.employeeIdentificataion = t1.sid
       JOIN #temptable t2
         ON t1.primaryId = t2.primaryId
            AND t1.secondaryId = t2.secondaryId
            AND t1.employeeId = t2.employeeId
            AND t1.employeeName = t2.employeeName
            AND t1.sid <> t2.sid
            AND t2.rn = 1

DELETE m
FROM   tblEmployees m
       JOIN #temptable t
         ON m.sid = t.sid
WHERE  t.rn > 1;

在sql fiddle here中查看此内容。

答案 1 :(得分:0)

以下查询将保留primaryId,secondaryId,employeeId,employeeName组合的最新记录

with mycte as
(
select primaryId, secondaryId, employeeId,employeeName,min(timestamp) as mntmstp
from tbl
group by primaryId, secondaryId, employeeId,employeeName
)delete from mycte