TSQL - 通过分组时间窗查找行重复 - 没有光标

时间:2014-11-20 13:28:47

标签: sql-server

某个系统会发送与日期时间之外的每个重要特征相匹配的重复消息。相反,系统最多可以发送50个仅相差几秒的重复项;例如,第一条消息是在07:59:41(hhmmss)发送的,而第50条消息是在08:00:07发送的,其他每个消息都介于两者之间。我想将这些消息视为相同,处理我的服务尚未处理的第一个消息,并使用错误代码标记其他消息。我选择了一个两分钟的窗口,消息将被标记为相等;这是一个可以接受的假设。

我有以下工作逻辑,但对于较大的潜在重复池,查询速度极慢。对于少于10个记录,它可以在几秒钟内处理,但对于1000个记录,它需要大约十分钟。

以下是实施的逻辑: 对于每条消息,请检查服务是否处理过前一分钟或一分钟后收到的任何消息。如果找到这种情况,请将消息设置为错误代码。

问题:  1.对于这种情况,是否有替代光标的方法? (我尝试了一个while循环,但它没有表现得更好。)  2.如果没有,我能做些什么来改善表现? (尝试限制基于主键搜索所需的记录对性能没有影响;我将尝试注释掉。)

 --create dataset
SELECT *
INTO #temp
FROM rosterload.RosterFeed WITH (NOLOCK)
WHERE StatusCode = 1
    AND DateDiff(MINUTE, CreatedTimeStamp, SYSDATETIME()) > 60

DECLARE @PKpotentialDuplicate AS INT;
DECLARE @PKrange INT
--SET @PKrange = 2000
DECLARE @DuplicateCursor AS CURSOR;

SET @DuplicateCursor = CURSOR
FOR

SELECT PK_RosterFeed
FROM #temp --TODO: rosterload.RosterFeed where StatusCode = 1

OPEN @DuplicateCursor;

FETCH NEXT
FROM @DuplicateCursor
INTO @PKpotentialDuplicate

WHILE @@FETCH_STATUS = 0
BEGIN
    IF EXISTS (
            --DECLARE @PKpotentialDuplicate as INT; DECLARE @PKrange int SET @PKrange = 2000 SET @PKpotentialDuplicate = 531474
            SELECT TOP 1 *
            FROM rosterload.RosterFeed WITH (NOLOCK) --rosterload.RosterFeed 
            WHERE
                --PK_RosterFeed > (select PK_RosterFeed - 2000 from rosterload.RosterFeed where PK_RosterFeed = @PKpotentialDuplicate) and PK_RosterFeed < (select PK_RosterFeed + 2000 from rosterload.RosterFeed where PK_RosterFeed = @PKpotentialDuplicate) and
                ABS(DATEDIFF(SECOND, EvtTimeStamp, (
                            SELECT EvtTimeStamp
                            FROM rosterload.RosterFeed
                            WHERE PK_RosterFeed = @PKpotentialDuplicate
                            ))) < 60
                AND AssigningAuthorityCode = (
                    SELECT AssigningAuthorityCode
                    FROM rosterload.RosterFeed
                    WHERE PK_RosterFeed = @PKpotentialDuplicate
                    )
                AND PatientIdentifier = (
                    SELECT PatientIdentifier
                    FROM rosterload.RosterFeed
                    WHERE PK_RosterFeed = @PKpotentialDuplicate
                    )
                AND EventType = (
                    SELECT EventType
                    FROM rosterload.RosterFeed
                    WHERE PK_RosterFeed = @PKpotentialDuplicate
                    )
                AND PatientClass = (
                    SELECT PatientClass
                    FROM rosterload.RosterFeed
                    WHERE PK_RosterFeed = @PKpotentialDuplicate
                    )
                AND MessageType = (
                    SELECT MessageType
                    FROM rosterload.RosterFeed
                    WHERE PK_RosterFeed = @PKpotentialDuplicate
                    )
                AND ABS(DATEDIFF(SECOND, EvtTimeStamp, (
                            SELECT EvtTimeStamp
                            FROM rosterload.RosterFeed
                            WHERE PK_RosterFeed = @PKpotentialDuplicate
                            ))) < 60
                AND PK_RosterFeed != @PKpotentialDuplicate
                AND StatusCode <> 1
            )
    BEGIN
        UPDATE #temp
        SET StatusCode = 15
        WHERE PK_RosterFeed = @PKpotentialDuplicate
    END

    FETCH NEXT
    FROM @DuplicateCursor
    INTO @PKpotentialDuplicate
END

CLOSE @DuplicateCursor;

DEALLOCATE @DuplicateCursor;

1 个答案:

答案 0 :(得分:4)

非常确定您可以使用基于常规设置的更新来执行此操作。 EXISTS子句只检查前一分钟中的类似记录,其中找到一条记录更新后的记录:

UPDATE  r
SET     StatusCode = 15
FROM    rosterload.RosterFeed AS r
WHERE   r.StatusCode = 1
AND     r.CreatedTimeStamp >= DATEADD(MINUTE, -60, SYSDATETIME())
AND     EXISTS
        (   SELECT  1
            FROM    rosterload.RosterFeed AS r2
            WHERE   r2.AssigningAuthorityCode = r.AssigningAuthorityCode
            AND     r2.EventType = r.EventType
            AND     r2.PatientIdentifier = r.PatientIdentifier
            AND     r2.PatientClass = r.PatientClass
            AND     r2.MessageType = r.MessageType
            AND     r2.EvtTimeStamp < r.EvtTimeStamp
            AND     r2.EvtTimeStamp >= DATEADD(SECOND, -60, r.EvtTimeStamp)
            AND     r2.StatusCode != 1
        );