删除按日期排序的组内的重复行

时间:2014-10-23 04:52:33

标签: sql sql-server sql-server-2012

我怀疑我的头衔非常有道理,我会尽力解释我的要求。我需要清理一个审计表,该表跟踪对象的状态何时被修改。出于这样或那样的原因,在对象的状态仍然相同的情况下,使用新日期创建多个记录。我需要保留每个状态更改的第一个记录,然后删除状态相同的任何后续记录。哦,没有主键。是啊! :|

以下是一个示例数据集:

ObjectID   ObjectState     DateOfEntry
101144      1           2007-08-14 12:39:30.587
101144      1           2007-08-14 12:41:52.620
101144      1           2007-08-14 12:42:11.150
101144      1           2007-08-14 12:42:24.197
101144      3           2007-08-14 12:44:06.403
101144      3           2007-08-14 12:44:06.467
101144      3           2007-08-14 12:46:12.573
101144      3           2007-08-14 12:50:51.670
101144      3           2007-08-14 12:50:51.750
101144      3           2007-08-14 12:56:34.330
101144      4           2007-08-14 17:28:59.280
101144      3           2007-08-14 17:32:26.313
101144      3           2007-08-14 17:32:48.720
101144      3           2007-08-14 17:45:07.460
101144      3           2007-08-14 17:46:31.740
101144      3           2007-08-14 17:47:04.380
101144      3           2007-08-14 17:47:29.507
101144      3           2007-08-14 17:49:13.460
101144      3           2007-08-14 17:54:15.320
101144      3           2007-08-14 17:55:57.540
101144      3           2007-08-14 19:50:11.913
101144      3           2007-08-14 19:53:10.820
101144      3           2007-08-14 20:03:44.900
101144      3           2007-08-16 10:34:56.477
101144      3           2007-08-16 10:36:06.477
101144      3           2007-08-16 10:36:24.570
101144      3           2007-11-06 09:19:26.157
101144      3           2007-11-06 09:24:28.200
101144      4           2010-09-27 14:11:03.287
101144      4           2014-01-27 17:31:58.077

结束表结果应为:

ObjectID   ObjectState     DateOfEntry
101144      1           2007-08-14 12:39:30.587
101144      3           2007-08-14 12:44:06.403
101144      4           2007-08-14 17:28:59.280
101144      3           2007-08-14 17:32:26.313
101144      4           2010-09-27 14:11:03.287

我尝试过使用RANK(),但问题是我不能只对ObjectState进行排序,因为ObjectState值可能无序重复。我必须通过DateOfEntry订购它们。但如果我做RANK() OVER(ORDER BY DateOfEntry)那么我基本上会得到行号。

如何创建一个SQL查询,允许我按DateOfEntry排序,然后按ObjectState分组,这样我就可以删除该“对象状态组”中的所有行,除了最小的一个小组?

3 个答案:

答案 0 :(得分:3)

简短回答:

; WITH Records AS (
    SELECT
        ObjectId,
        ObjectState,
        DateOfEntry,
        ROW_NUMBER() OVER (PARTITION BY ObjectID ORDER BY DateOfEntry) AS RowNum
    FROM @Audits
)
    DELETE R2
    FROM Records R1
        INNER JOIN Records R2
            ON R1.ObjectId = R2.ObjectId
                AND R1.ObjectState = R2.ObjectState
                AND R1.RowNum + 1 = R2.RowNum

解决方案证明

DECLARE @Audits TABLE (ObjectID INT, ObjectState INT, DateOfEntry DATETIME)
INSERT @Audits
    SELECT 101144,1,'2007-08-14 12:39:30.587' UNION ALL
    SELECT 101144,1,'2007-08-14 12:41:52.620' UNION ALL
    SELECT 101144,1,'2007-08-14 12:42:11.150' UNION ALL
    SELECT 101144,1,'2007-08-14 12:42:24.197' UNION ALL
    SELECT 101144,3,'2007-08-14 12:44:06.403' UNION ALL
    SELECT 101144,3,'2007-08-14 12:44:06.467' UNION ALL
    SELECT 101144,3,'2007-08-14 12:46:12.573' UNION ALL
    SELECT 101144,3,'2007-08-14 12:50:51.670' UNION ALL
    SELECT 101144,3,'2007-08-14 12:50:51.750' UNION ALL
    SELECT 101144,3,'2007-08-14 12:56:34.330' UNION ALL
    SELECT 101144,4,'2007-08-14 17:28:59.280' UNION ALL
    SELECT 101144,3,'2007-08-14 17:32:26.313' UNION ALL
    SELECT 101144,3,'2007-08-14 17:32:48.720' UNION ALL
    SELECT 101144,3,'2007-08-14 17:45:07.460' UNION ALL
    SELECT 101144,3,'2007-08-14 17:46:31.740' UNION ALL
    SELECT 101144,3,'2007-08-14 17:47:04.380' UNION ALL
    SELECT 101144,3,'2007-08-14 17:47:29.507' UNION ALL
    SELECT 101144,3,'2007-08-14 17:49:13.460' UNION ALL
    SELECT 101144,3,'2007-08-14 17:54:15.320' UNION ALL
    SELECT 101144,3,'2007-08-14 17:55:57.540' UNION ALL
    SELECT 101144,3,'2007-08-14 19:50:11.913' UNION ALL
    SELECT 101144,3,'2007-08-14 19:53:10.820' UNION ALL
    SELECT 101144,3,'2007-08-14 20:03:44.900' UNION ALL
    SELECT 101144,3,'2007-08-16 10:34:56.477' UNION ALL
    SELECT 101144,3,'2007-08-16 10:36:06.477' UNION ALL
    SELECT 101144,3,'2007-08-16 10:36:24.570' UNION ALL
    SELECT 101144,3,'2007-11-06 09:19:26.157' UNION ALL
    SELECT 101144,3,'2007-11-06 09:24:28.200' UNION ALL
    SELECT 101144,4,'2010-09-27 14:11:03.287' UNION ALL
    SELECT 101144,4,'2014-01-27 17:31:58.077'

; WITH Records AS (
    SELECT
        ObjectId,
        ObjectState,
        DateOfEntry,
        ROW_NUMBER() OVER (PARTITION BY ObjectID ORDER BY DateOfEntry) AS RowNum
    FROM @Audits
)
    DELETE R2
    FROM Records R1
        INNER JOIN Records R2
            ON R1.ObjectId = R2.ObjectId
                AND R1.ObjectState = R2.ObjectState
                AND R1.RowNum + 1 = R2.RowNum
SELECT * FROM @Audits

产生此输出

ObjectID    ObjectState DateOfEntry
----------- ----------- -----------------------
101144      1           2007-08-14 12:39:30.587
101144      3           2007-08-14 12:44:06.403
101144      4           2007-08-14 17:28:59.280
101144      3           2007-08-14 17:32:26.313
101144      4           2010-09-27 14:11:03.287

答案 1 :(得分:0)

如果@table是你的表,可能是下面的sql会帮助你。我假设DateOfEntry是排序的。

DELETE      B
FROM        (SELECT *,ROW_NUMBER() OVER(ORDER BY DateOfEntry) [ROW] FROM @table) A
LEFT JOIN   (SELECT *,ROW_NUMBER() OVER(ORDER BY DateOfEntry) [ROW] FROM @table) B 
            ON  A.[Row] = B.[Row] - 1 
            AND ABS(a.ObjectState - b.ObjectState) = 0

我之前和之后的结果

enter image description here

注意:请不要介意DateOfEntry列中的数据。为方便起见,我保留了它的号码。

答案 2 :(得分:0)

我最终想到使用LAG()的替代解决方案将消除CTE。

DELETE @Audits
FROM @Audits a1
INNER JOIN (SELECT ObjectID, DateOfEntry
                FROM (SELECT ObjectID, DateOfEntry, ObjectState, 
                        LAG(ObjectState) OVER(PARTITION BY ObjectID ORDER BY DateOfEntry) AS [PreviousObjectState]
                          FROM @Audits) AS Audits
             WHERE Audits.ObjectState = PreviousObjectState
             ) a2
 ON a2.ObjectID = a1.ObjectID AND a2.DateOfEntry = a1.DateOfEntry

SELECT * FROM @Audits

带证明的长版本(我已经使用不同的ID复制数据集以验证分区是否按预期工作)

DECLARE @Audits TABLE (ObjectID INT, ObjectState INT, DateOfEntry DATETIME)
INSERT @Audits
    SELECT 101144,1,'2007-08-14 12:39:30.587' UNION ALL
    SELECT 101144,1,'2007-08-14 12:41:52.620' UNION ALL
    SELECT 101144,1,'2007-08-14 12:42:11.150' UNION ALL
    SELECT 101144,1,'2007-08-14 12:42:24.197' UNION ALL
    SELECT 101144,3,'2007-08-14 12:44:06.403' UNION ALL
    SELECT 101144,3,'2007-08-14 12:44:06.467' UNION ALL
    SELECT 101144,3,'2007-08-14 12:46:12.573' UNION ALL
    SELECT 101144,3,'2007-08-14 12:50:51.670' UNION ALL
    SELECT 101144,3,'2007-08-14 12:50:51.750' UNION ALL
    SELECT 101144,3,'2007-08-14 12:56:34.330' UNION ALL
    SELECT 101144,4,'2007-08-14 17:28:59.280' UNION ALL
    SELECT 101144,3,'2007-08-14 17:32:26.313' UNION ALL
    SELECT 101144,3,'2007-08-14 17:32:48.720' UNION ALL
    SELECT 101144,3,'2007-08-14 17:45:07.460' UNION ALL
    SELECT 101144,3,'2007-08-14 17:46:31.740' UNION ALL
    SELECT 101144,3,'2007-08-14 17:47:04.380' UNION ALL
    SELECT 101144,3,'2007-08-14 17:47:29.507' UNION ALL
    SELECT 101144,3,'2007-08-14 17:49:13.460' UNION ALL
    SELECT 101144,3,'2007-08-14 17:54:15.320' UNION ALL
    SELECT 101144,3,'2007-08-14 17:55:57.540' UNION ALL
    SELECT 101144,3,'2007-08-14 19:50:11.913' UNION ALL
    SELECT 101144,3,'2007-08-14 19:53:10.820' UNION ALL
    SELECT 101144,3,'2007-08-14 20:03:44.900' UNION ALL
    SELECT 101144,3,'2007-08-16 10:34:56.477' UNION ALL
    SELECT 101144,3,'2007-08-16 10:36:06.477' UNION ALL
    SELECT 101144,3,'2007-08-16 10:36:24.570' UNION ALL
    SELECT 101144,3,'2007-11-06 09:19:26.157' UNION ALL
    SELECT 101144,3,'2007-11-06 09:24:28.200' UNION ALL
    SELECT 101144,4,'2010-09-27 14:11:03.287' UNION ALL
    SELECT 101144,4,'2014-01-27 17:31:58.077' UNION ALL
    SELECT 101145,1,'2007-08-14 12:39:30.587' UNION ALL
    SELECT 101145,1,'2007-08-14 12:41:52.620' UNION ALL
    SELECT 101145,1,'2007-08-14 12:42:11.150' UNION ALL
    SELECT 101145,1,'2007-08-14 12:42:24.197' UNION ALL
    SELECT 101145,3,'2007-08-14 12:44:06.403' UNION ALL
    SELECT 101145,3,'2007-08-14 12:44:06.467' UNION ALL
    SELECT 101145,3,'2007-08-14 12:46:12.573' UNION ALL
    SELECT 101145,3,'2007-08-14 12:50:51.670' UNION ALL
    SELECT 101145,3,'2007-08-14 12:50:51.750' UNION ALL
    SELECT 101145,3,'2007-08-14 12:56:34.330' UNION ALL
    SELECT 101145,4,'2007-08-14 17:28:59.280' UNION ALL
    SELECT 101145,3,'2007-08-14 17:32:26.313' UNION ALL
    SELECT 101145,3,'2007-08-14 17:32:48.720' UNION ALL
    SELECT 101145,3,'2007-08-14 17:45:07.460' UNION ALL
    SELECT 101145,3,'2007-08-14 17:46:31.740' UNION ALL
    SELECT 101145,3,'2007-08-14 17:47:04.380' UNION ALL
    SELECT 101145,3,'2007-08-14 17:47:29.507' UNION ALL
    SELECT 101145,3,'2007-08-14 17:49:13.460' UNION ALL
    SELECT 101145,3,'2007-08-14 17:54:15.320' UNION ALL
    SELECT 101145,3,'2007-08-14 17:55:57.540' UNION ALL
    SELECT 101145,3,'2007-08-14 19:50:11.913' UNION ALL
    SELECT 101145,3,'2007-08-14 19:53:10.820' UNION ALL
    SELECT 101145,3,'2007-08-14 20:03:44.900' UNION ALL
    SELECT 101145,3,'2007-08-16 10:34:56.477' UNION ALL
    SELECT 101145,3,'2007-08-16 10:36:06.477' UNION ALL
    SELECT 101145,3,'2007-08-16 10:36:24.570' UNION ALL
    SELECT 101145,3,'2007-11-06 09:19:26.157' UNION ALL
    SELECT 101145,3,'2007-11-06 09:24:28.200' UNION ALL
    SELECT 101145,4,'2010-09-27 14:11:03.287' UNION ALL
    SELECT 101145,4,'2014-01-27 17:31:58.077'

DELETE @Audits
FROM @Audits a1
INNER JOIN (SELECT ObjectID, DateOfEntry
                FROM (SELECT ObjectID, DateOfEntry, ObjectState, 
                        LAG(ObjectState) OVER(PARTITION BY ObjectID ORDER BY DateOfEntry) AS [PreviousUserState]
                          FROM @Audits) AS Audits
             WHERE Audits.ObjectState = PreviousUserState
             ) a2
 ON a2.ObjectID = a1.ObjectID AND a2.DateOfEntry = a1.DateOfEntry

SELECT * FROM @Audits

产生此输出

ObjectID    ObjectState DateOfEntry
----------- ----------- -----------------------
101144      1           2007-08-14 12:39:30.587
101144      3           2007-08-14 12:44:06.403
101144      4           2007-08-14 17:28:59.280
101144      3           2007-08-14 17:32:26.313
101144      4           2010-09-27 14:11:03.287
101145      1           2007-08-14 12:39:30.587
101145      3           2007-08-14 12:44:06.403
101145      4           2007-08-14 17:28:59.280
101145      3           2007-08-14 17:32:26.313
101145      4           2010-09-27 14:11:03.287