Question

我目前拥有以下格式的数据库

ID  |  DateTime             |  PID   |  TIU
1   |  2013-11-18 00:15:00  |  1551  |  1005
2   |  2013-11-18 00:16:03  |  1551  |  1885
3   |  2013-11-18 00:16:30  |  9110  |  75527
4   |  2013-11-18 00:22:01  |  1022  |  75
5   |  2013-11-18 00:22:09  |  1019  |  1311
6   |  2013-11-18 00:23:52  |  1022  |  89
7   |  2013-11-18 00:24:19  |  1300  |  44433
8   |  2013-11-18 00:38:57  |  9445  |  2010

我有一个场景，我需要使用DateTime列确定流程中的差距超过5分钟。

我想要实现的一个例子是：

ID  |  DateTime             |  PID   |  TIU
3   |  2013-11-18 00:16:30  |  9110  |  75527
4   |  2013-11-18 00:22:01  |  1022  |  75
7   |  2013-11-18 00:24:50  |  1300  |  44433
8   |  2013-11-18 00:38:57  |  9445  |  2010

ID3是6分1秒间隙之前的最后一行，ID4是它之后的下一行 ID7是14分7秒差距之前的最后一行，ID8是下一个可用记录。

我正在尝试在SQL中执行此操作，但是如果需要，我可以在C＃中执行此操作来代替。

我尝试了很多内连接，但是这个表超过300万行，因此性能受到很大影响。

Answer 1

这是一个CTE解决方案，但正如已经指出的那样，这可能并不总是表现良好 - 因为我们必须针对DateTime列计算函数，大多数索引都是无用的：

declare @t table (ID int not null,[DateTime] datetime not null,
                  PID int not null,TIU int not null)
insert into @t(ID,[DateTime],PID,TIU) values
(1,'2013-11-18 00:15:00',1551,1005  ),
(2,'2013-11-18 00:16:03',1551,1885  ),
(3,'2013-11-18 00:16:30',9110,75527 ),
(4,'2013-11-18 00:22:01',1022,75    ),
(5,'2013-11-18 00:22:09',1019,1311  ),
(6,'2013-11-18 00:23:52',1022,89    ),
(7,'2013-11-18 00:24:19',1300,44433 ),
(8,'2013-11-18 00:38:57',9445,2010  )

;With Islands as (
    select ID as MinID,[DateTime],ID as RecID from @t t1
    where not exists
        (select * from @t t2
            where t2.ID < t1.ID and --Or by date, if needed
                    --Use 300 seconds to avoid most transition issues
            DATEDIFF(second,t2.[DateTime],t1.[DateTime]) < 300
        )
    union all
    select i.MinID,t2.[DateTime],t2.ID
    from Islands i
        inner join
        @t t2
            on
                i.RecID < t2.ID and
                DATEDIFF(second,i.[DateTime],t2.[DateTime]) < 300
), Ends as (
    select MinID,MAX(RecID) as MaxID from Islands group by MinID
)
select * from @t t
where exists(select * from Ends e where e.MinID = t.ID or e.MaxID = t.ID)

这也为ID 1返回一行，因为该行在5分钟内没有前一行 - 但如果需要，这应该很容易在最终选择中排除。

我假设我们可以使用ID作为增加日期的代理 - 如果第二行中的ID更高，那么DateTime也会增加晚点。

Islands是递归CTE。上半部分（锚点）只选择在5分钟内没有任何前一行的行。我们为这些行选择ID两次，并保留DateTime。

在递归部分，我们尝试从表中找到一个可以“添加”到现有Islands行的新行 - 基于此新行不超过当前行的5分钟岛的终点。

一旦递归完成，我们就会排除CTE产生的中间行。例如。对于“4”岛，它生成了以下行：

4,00:22:01,4
4,00:22:09,5
4,00:23:52,6
4,00:24:19,7

我们关心的是最后一行，我们已经确定了从ID 4到ID 7的时间“岛” - 这就是第二个CTE（Ends）为我们找到的。

计算行之间的时差

1 个答案: