删除在基于日期的时间轴内出现的连续重复的值

时间:2018-09-07 13:27:39

标签: sql-server tsql sql-server-2012

我有一个表,其中包含基于日期的用户操作。该表用作事件的时间轴。以下示例显示了两个人如何随时间改变其工作角色:

DECLARE @tbl TABLE (
    UserID int,
    ActionID int,
    ActionDesc nvarchar(50),
    ActionDate datetime
);
INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate)
VALUES 
    -- First person
    (1, 200, 'Promoted',   '2000-01-01'),   
    (1, 200, 'Promoted',   '2001-01-01'),   
    (1, 200, 'Promoted',   '2002-02-01'),   
    (1, 300, 'Moved',      '2004-03-01'),   
    (1, 200, 'Promoted',   '2005-03-01'),   
    (1, 200, 'Promoted',   '2006-03-01'),
    -- Second person
    (2, 200, 'Promoted',   '2006-01-01'),   
    (2, 300, 'Moved',      '2007-01-01'),
    (2, 200, 'Promoted',   '2008-01-01');

SELECT * FROM @tbl ORDER BY UserID, ActionDate DESC;

这将显示以下内容,首先显示为最新事件:

enter image description here

我需要按相反的日期顺序显示表格,但是要根据[UserID / ActionID]匹配项,删除刚刚发生的所有事件。例如,如果此人被提升,然后在此之后立即又被提升,则第二次提升将不包括在结果中,因为它被认为是前一个动作的重复。

因此,所需的输出是:

enter image description here

经过研究,我尝试让ROW_NUMBER()来识别重复项:

SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY UserID, ActionID ORDER BY ActionDate ASC) AS RowNum
FROM
    @tbl
ORDER BY
    UserID, ActionDate DESC;

...但是它并不是很有效,因为在每次执行不同操作后都不会重置编号。我可能对此考虑不周,但是却在寻求灵感,因为搜索结果返回了无数问题,人们只是从列表中删除重复项。

3 个答案:

答案 0 :(得分:4)

我将使用LEAD消除不必要的行。

?pageNumber=0&pageSize=20

以上查询中的USE tempdb; DECLARE @tbl TABLE ( UserID int, ActionID int, ActionDesc nvarchar(50), ActionDate datetime ); INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate) VALUES -- First person (1, 200, 'Promoted', '2000-01-01'), (1, 200, 'Promoted', '2001-01-01'), (1, 200, 'Promoted', '2002-02-01'), (1, 300, 'Moved', '2004-03-01'), (1, 200, 'Promoted', '2005-03-01'), (1, 200, 'Promoted', '2006-03-01'), -- Second person (2, 200, 'Promoted', '2006-01-01'), (2, 300, 'Moved', '2007-01-01'), (2, 200, 'Promoted', '2008-01-01'); ;WITH src AS ( SELECT * , l = LEAD(t.ActionID) OVER (PARTITION BY t.UserID ORDER BY t.ActionDate DESC) FROM @tbl t ) SELECT src.UserID , src.ActionID , src.ActionDesc , src.ActionDate FROM src WHERE src.l <> src.ActionID OR src.l IS NULL 子句从输出中消除重复的行,其中前一行是当前行的重复ActionID。 WHERE确保我们看到没有重复ActionID的行。

结果:

╔════════╦══════════╦════════════╦═════════════════════════╗
║ UserID ║ ActionID ║ ActionDesc ║       ActionDate        ║
╠════════╬══════════╬════════════╬═════════════════════════╣
║      1 ║      200 ║ Promoted   ║ 2005-03-01 00:00:00.000 ║
║      1 ║      300 ║ Moved      ║ 2004-03-01 00:00:00.000 ║
║      1 ║      200 ║ Promoted   ║ 2000-01-01 00:00:00.000 ║
║      2 ║      200 ║ Promoted   ║ 2008-01-01 00:00:00.000 ║
║      2 ║      300 ║ Moved      ║ 2007-01-01 00:00:00.000 ║
║      2 ║      200 ║ Promoted   ║ 2006-01-01 00:00:00.000 ║
╚════════╩══════════╩════════════╩═════════════════════════╝

对于具有大量行的表,您希望将查询中使用的聚合数量减少到最小; LEAD只需要一个汇总即可提供此功能。我的版本的执行计划:

enter image description here

答案 1 :(得分:2)

DECLARE @tbl TABLE (
    UserID int,
    ActionID int,
    ActionDesc nvarchar(50),
    ActionDate datetime
);
INSERT INTO @tbl (UserID, ActionID, ActionDesc, ActionDate)
VALUES 
    -- First person
    (1, 200, 'Promoted',   '2000-01-01'),   
    (1, 200, 'Promoted',   '2001-01-01'),   
    (1, 200, 'Promoted',   '2002-02-01'),   
    (1, 300, 'Moved',      '2004-03-01'),   
    (1, 200, 'Promoted',   '2005-03-01'),   
    (1, 200, 'Promoted',   '2006-03-01'),
    -- Second person
    (2, 200, 'Promoted',   '2006-01-01'),   
    (2, 300, 'Moved',      '2007-01-01'), --<<--- here ActionID is 300
    (2, 200, 'Promoted',   '2008-01-01');

select UserID, ActionID, ActionDesc, min(ActionDate) as dt
  from (
         select t.*
              , row_number() over(partition by UserID, ActionID order by ActionDate)
                - row_number() over(partition by UserID order by ActionDate) as grp_id
           from @tbl t
       ) v
 group by grp_id, UserID, ActionID, ActionDesc
 order by UserID, min(ActionDate) desc;

这将提供您的结果,但仅当ActionID中的Moved为300时,否则,应按ActionDesc而不是ActionID进行分区。

答案 2 :(得分:2)

SELECT * FROM
    (SELECT *, ROW_NUMBER() over (partition by Q2.userid, Q2.ActionId, rn2 order by Q2.actiondate) rn3 FROM
        (select *, Q1.rn - ROW_NUMBER() over (partition by Q1.userid, Q1.actionid order by Q1.actiondate) rn2 from 
            (SELECT *,ROW_NUMBER() over (order by userid, actiondate) rn from @tbl) Q1
        ) Q2
    ) 
Q3 Where q3.rn3 = 1 ORDER BY Q3.UserID,Q3.ActionDate 

第一个(内部)查询为每行分配一个row_number,并按userid和actiondate进行排序-然后我计算出与之相同的row_number,但也对“ action”进行了分区-如果我从A减去B,我得到一个数字只能应用于一组用户ID和操作-通过设置另一个row_number,按userid,actionId和我的rown_number进行分区并按日期排序,然后我可以选择最早的日期作为第1行。