SQL Server - 从一组类似记录中选择最新记录

时间:2011-06-07 19:07:20

标签: sql sql-server

- 向下滚动以查看我添加的编辑 -

所以这是我的情景。我有一个表,每当有人对某些数据进行更改时都会有一个条目。原因是我们需要能够审核所有更改。

但是,我只想检索用户进行的一系列编辑的最新记录。

所以假设有三个用户,用户A,B和C.

用户A进行了10次更改(表中有10个条目)。 用户B进行了5次更改 用户A进行了3次更改 用户C进行了2次更改

我想要回来的是: C创建的2条记录中的最新记录 A创建的3条记录中的最新记录 B创建的5条记录中的最新记录 A创建的10条记录中的最新记录

总共4行我回来了

这是我尝试过的,但问题是,当LastUpdatedBy发生变化时,RowNum不会回到1:

WITH cte AS 
(
    SELECT 
        [LastUpdatedOn]
        ,[LastUpdatedBy]
        ,ROW_NUMBER() OVER(PARTITION BY [LastUpdatedBy] ORDER BY [LastUpdatedOn] DESC) [RowNum]
    FROM [HistoricalTable] 
)           
SELECT 
    [LastUpdatedOn]
    ,[LastUpdatedBy]
    ,RowNum
FROM cte
--WHERE RowNum = 1 
ORDER BY [LastUpdatedOn] DESC;

这是我得到的输出(**星号代表我想要的行)

LastUpdatedOn   LastUpdatedBy   RowNum
**2011-06-07 13:07:26.917   629 1**
2011-06-07 12:57:53.700 629 2
2011-06-07 12:57:44.387 629 3
2011-06-07 12:57:34.913 629 4
2011-06-07 12:57:25.040 629 5
2011-06-07 12:57:19.927 629 6
2011-06-07 12:55:17.460 629 7
2011-06-07 12:55:12.287 629 8
2011-06-07 12:30:34.377 629 9
**2011-06-07 11:54:05.727   4   1**
**2011-06-07 11:50:02.723   629 10** (If this number went back to 1, my query would have worked fine)
2011-06-07 11:26:43.053 629 11
2011-06-07 10:54:32.867 629 12
2011-06-07 10:46:32.107 629 13
2011-06-07 10:40:52.937 629 14
**2011-06-07 10:39:50.880   3   1**

-------------------编辑--------------------

所以我提出了一个解决方案,但它并不是很优雅,也不确定我是否喜欢它,但它可以解决问题。这可能会让您更好地理解我想要完成的任务。

DECLARE @temp AS TABLE(LastUpdatedOn datetime, LastUpdatedBy int null, RowNum int);

DECLARE @newTable AS TABLE(LastUpdatedOn datetime, LastUpdatedBy int null);

DECLARE @lastUserId int = 0;

INSERT INTO @temp
SELECT 
    [LastUpdatedOn]
    ,[LastUpdatedBy]
    ,ROW_NUMBER() OVER(ORDER BY [LastUpdatedOn] DESC) [RowNum]
    FROM [HistoricalTable] 

DECLARE @totalRecords int;
SELECT @totalRecords = COUNT(*) FROM @temp;
DECLARE @counter int = 0;
WHILE @counter <= @totalRecords BEGIN
    SET @counter = @counter + 1;

    INSERT INTO @newTable
    SELECT LastUpdatedOn, LastUpdatedBy
    FROM @temp 
    WHERE RowNum = @counter AND (@lastUserId != LastUpdatedBy OR (LastUpdatedBy IS NULL));

    SELECT @lastUserId = LastUpdatedBy  FROM @temp WHERE RowNum = @counter;     
END

SELECT * FROM @newTable;

返回的数据:

LastUpdatedOn   LastUpdatedBy
2011-06-07 13:07:26.917 629
2011-06-07 11:54:05.727 4
2011-06-07 11:50:02.723 629
2011-06-07 10:39:50.880 3

4 个答案:

答案 0 :(得分:5)

;with cte as
(
  select *,
    row_number() over(order by LastUpdatedOn) as rn
  from HistoricalTable
)
select C1.LastUpdatedOn,
       C1.LastUpdatedBy
from cte as C1
  left outer join cte as C2
    on C1.rn = C2.rn-1
where C1.LastUpdatedBy <> coalesce(C2.LastUpdatedBy, 0)

LastUpdatedOn为每个行顺序创建行号,并加入下一行并比较LastUpdatedBy是否已更改。 谨防这个coalesce(C2.LastUpdatedBy, 0)。它是获取最后一行,0需要是一个不用作LastUpdatedBy的整数值。

答案 1 :(得分:2)

不确定我是否遗漏了你问题中的内容,但是下面的SQL没有回答这个问题?

declare @HistoricalTable table (LastUpdatedOn datetime, LastUpdatedBy int);

insert into @HistoricalTable (LastUpdatedOn, LastUpdatedBy) values 
('2011-06-07 13:07:26.917', 629),('2011-06-07 12:57:53.700', 629),
('2011-06-07 12:57:44.387', 629),('2011-06-07 12:57:34.913', 629),
('2011-06-07 12:57:25.040', 629),('2011-06-07 12:57:19.927', 629),
('2011-06-07 12:55:17.460', 629),('2011-06-07 12:55:12.287', 629),
('2011-06-07 12:30:34.377', 629),('2011-06-07 11:54:05.727', 4),
('2011-06-07 11:50:02.723', 629),('2011-06-07 11:26:43.053', 629),
('2011-06-07 10:54:32.867', 629),('2011-06-07 10:46:32.107', 629),
('2011-06-07 10:40:52.937', 629),('2011-06-07 10:39:50.880', 3);

select 
 latest.* 
from
(
 select *, rank() over (partition by LastUpdatedBy order by LastUpdatedOn desc) as UpdateRank 
  from @HistoricalTable
) latest
where
 latest.UpdateRank = 1
order by
 latest.LastUpdatedBy;

LastUpdatedOn           LastUpdatedBy   UpdateRank
2011-06-07 10:39:50.880            3            1
2011-06-07 11:54:05.727            4            1
2011-06-07 13:07:26.917          629            1

答案 2 :(得分:1)

今天早上让我感到震惊的是这是一个岛屿问题。这是我的解决方案:

CREATE TABLE #tmp (
 LastUpdatedBy INT,
 LastUpdatedOn DATETIME
)

INSERT  INTO #tmp
        ( LastUpdatedOn, LastUpdatedBy )
VALUES  ( '2011-06-07 13:07:26.917', 629 ),
        ( '2011-06-07 12:57:53.700', 629 ),
        ( '2011-06-07 12:57:44.387', 629 ),
        ( '2011-06-07 12:57:34.913', 629 ),
        ( '2011-06-07 12:57:25.040', 629 ),
        ( '2011-06-07 12:57:19.927', 629 ),
        ( '2011-06-07 12:55:17.460', 629 ),
        ( '2011-06-07 12:55:12.287', 629 ),
        ( '2011-06-07 12:30:34.377', 629 ),
        ( '2011-06-07 11:54:05.727', 4 ),
        ( '2011-06-07 11:50:02.723', 629 ),
        ( '2011-06-07 11:26:43.053', 629 ),
        ( '2011-06-07 10:54:32.867', 629 ),
        ( '2011-06-07 10:46:32.107', 629 ),
        ( '2011-06-07 10:40:52.937', 629 ),
        ( '2011-06-07 10:39:50.880', 3 ) ;

WITH    cte
          AS ( SELECT   [LastUpdatedOn],
                        [LastUpdatedBy],
                        ROW_NUMBER() OVER ( PARTITION BY [LastUpdatedBy] ORDER BY [LastUpdatedOn] DESC ) - ROW_NUMBER() OVER ( ORDER BY [LastUpdatedOn] DESC ) AS [Island]
               FROM     #tmp
             ),
        cte2
          AS ( SELECT   *,
                        ROW_NUMBER() OVER ( PARTITION BY [Island] ORDER BY [LastUpdatedOn] DESC ) AS [rn]
               FROM     cte
             )
    SELECT  [LastUpdatedOn],
            [LastUpdatedBy]
    FROM    cte2
    WHERE   [rn] = 1
    ORDER BY [LastUpdatedOn] DESC ;

这里的“技巧”是要注意,如果你在分区内和整个集合中跟踪row_number,那么当分区改变时,两者之间的差异将会改变。

答案 3 :(得分:0)

这完全未经测试,但它可能构成工作解决方案的基础:

SELECT
    [Outer].[LastUpdatedOn],
    [Outer].[LastUpdatedBy]
FROM [HistoricalTable] AS [Outer]
WHERE NOT EXISTS
(
    SELECT *
    FROM [HistoricalTable] AS [Middle]
    WHERE [Middle].[LastUpdatedBy] = [Outer].[LastUpdatedBy]
        AND [Middle].[LastUpdatedOn] > [Outer].[LastUpdatedOn]
        AND [Middle].[LastUpdatedOn] <= ISNULL(
        (
            SELECT
                MIN([Inner].[LastUpdatedOn])
            FROM [HistoricalTable] AS [Inner]
            WHERE [Inner].[LastUpdatedBy] != [Outer].[LastUpdatedBy]
                AND [Inner].[LastUpdatedOn] > [Outer].[LastUpdatedOn]
        ), [Middle].[LastUpdatedOn])
)

即使这种方法有效,假设你不仅仅有少数几行,性能也可能很糟糕。

对于表中的每一行,它确保同一用户在上下文行和最新行之间不存在任何其他行,这些行比链接到不同用户的上下文行更新。