仅获取当前行与下一行之间的时差小于5分钟的行

时间:2017-11-07 23:48:18

标签: sql sql-server tsql sql-server-2012

TSQL查询 Accounts with Different Times of transaction done

我需要帮助找出一种方法来仅提取当前和下一行之间的txn_time差异小于5分钟的记录。 txn_time已排序。

查看附图,只显示1,2,3,6,7,8行,因为每行之间的时差不到5分钟。

任何想法都会有所帮助。

样本数据:

rowno   txn_Date_Time   txn_time    accountNo
1   2017-10-31  11:50:47.0000000    98989898
2   2017-10-31  11:52:23.0000000    98989898
3   2017-10-31  11:52:23.0000000    98989898
4   2017-10-31  11:59:03.0000000    98989898
5   2017-10-31  12:05:13.0000000    98989898
6   2017-10-31  12:41:06.0000000    98989898
7   2017-10-31  12:42:44.0000000    98989898
8   2017-10-31  12:44:02.0000000    98989898
9   2017-10-31  15:23:19.0000000    98989898
10  2017-10-31  16:19:17.0000000    98989898

3 个答案:

答案 0 :(得分:3)

在SQL Server 2012+中,使用LEADLAG函数而不是自联接效率更高。

WITH
CTE
AS
(
    SELECT
        rowno
        ,txn_Date_Time
        ,txn_time
        ,accountNo
        ,LEAD(txn_time) OVER (PARTITION BY accountNo ORDER BY txn_time, rowno) AS next_time
        ,LAG(txn_time) OVER (PARTITION BY accountNo ORDER BY txn_time, rowno) AS prev_time
    FROM T
)
SELECT
    rowno
    ,txn_Date_Time
    ,txn_time
    ,accountNo
FROM CTE
WHERE
    DATEDIFF(second, prev_time, txn_time) < 5 * 60
    OR
    DATEDIFF(second, txn_time, next_time) < 5 * 60
ORDER BY txn_time, rowno;

答案 1 :(得分:2)

因为您使用的是SQL 2012,所以可以使用Window Offset Functions,例如LAG和LEAD。然而,@ vladimir打败了我。他和我把类似的解决方案放在一起。

为了让事情变得有趣,我将演示如何优化您的查询,以便LAG和LEAD都不会导致SQL服务器需要排序来满足您的查询。我正在创建的索引类型称为 POC索引 ,后面会讨论here

为简单起见,我使用txn_date_time的单列日期时间数据类型。我将创建两个相同的表并针对它们运行我的解决方案。第二个表上会有一个poc索引。

示例数据

-- sample data
if object_id('tempdb..#table')  is not null drop table #table;
if object_id('tempdb..#table2') is not null drop table #table2;
go
create table #table
(
  rowno int identity,
  txn_date_time datetime,
  accountNo int
);
create table #table2
(
  rowno int identity,
  txn_date_time datetime,
  accountNo int
);

-- populate #table 
declare @dt varchar(9) = '20171031 ', @acn int = 98989898;
insert #table (txn_date_time, accountNo)
values
(@dt+'11:50:47',@acn), (@dt+'11:52:23', @acn), (@dt+'11:52:23',@acn),
(@dt+'11:59:03',@acn), (@dt+'12:05:13', @acn), (@dt+'12:41:06',@acn),
(@dt+'12:42:44',@acn), (@dt+'12:44:02', @acn), (@dt+'15:23:19',@acn),(@dt+'16:19:17',@acn);
-- populate #table2    
insert #table2 (txn_date_time, accountNo)
select txn_date_time, accountNo from #table;

-- create unique clustered index on #table2
create unique clustered index uq_cl_table2 on #table2(txn_date_time, rowno);
GO

对两个表运行相同的查询,记住第二个表上有poc索引。

-- #table
select rowno, txn_date_time, accountNo
from
(
  select rowno, txn_date_time, accountNo, 
    nextDt = datediff(minute, txn_date_time, lead(txn_date_time, 1) over (order by txn_date_time)),
    prevDt = datediff(minute, lag(txn_date_time, 1)  over (order by txn_date_time), txn_date_time)
  from #table
) fixedDates
where nextDt <= 5 or prevDt <= 5;
-- #table2
select rowno, txn_date_time, accountNo
from
(
  select rowno, txn_date_time, accountNo, 
    nextDt = datediff(minute, txn_date_time, lead(txn_date_time, 1) over (order by txn_date_time)),
    prevDt = datediff(minute, lag(txn_date_time, 1)  over (order by txn_date_time), txn_date_time)
  from #table2
) fixedDates
where nextDt <= 5 or prevDt <= 5;

请注意执行计划。添加poc索引会删除排序并使查询的效率提高四倍。

enter image description here

答案 2 :(得分:1)

尝试自联接以附加上一行,然后联合第二个自我加入上一行的查询:

SELECT
     t1.rowno
    ,t1.txn_Date_time
    ,t1.txn_time
    ,t1.accountNo
FROM [table] t1
JOIN [table] t2
    ON t2.rowno = t1.rowno + 1
WHERE DATEDIFF(MINUTE, t1.txn_time, t2.txn_time) < 5

UNION

SELECT
     t1.rowno
    ,t1.txn_Date_time
    ,t1.txn_time
    ,t1.accountNo
FROM [table] t1
JOIN [table] t2
    ON t2.rowno = t1.rowno - 1
WHERE DATEDIFF(MINUTE, t2.txn_time, t1.txn_time) < 5