TSQL查询 Accounts with Different Times of transaction done
我需要帮助找出一种方法来仅提取当前和下一行之间的txn_time
差异小于5分钟的记录。
txn_time
已排序。
查看附图,只显示1,2,3,6,7,8行,因为每行之间的时差不到5分钟。
任何想法都会有所帮助。
rowno txn_Date_Time txn_time accountNo
1 2017-10-31 11:50:47.0000000 98989898
2 2017-10-31 11:52:23.0000000 98989898
3 2017-10-31 11:52:23.0000000 98989898
4 2017-10-31 11:59:03.0000000 98989898
5 2017-10-31 12:05:13.0000000 98989898
6 2017-10-31 12:41:06.0000000 98989898
7 2017-10-31 12:42:44.0000000 98989898
8 2017-10-31 12:44:02.0000000 98989898
9 2017-10-31 15:23:19.0000000 98989898
10 2017-10-31 16:19:17.0000000 98989898
答案 0 :(得分:3)
在SQL Server 2012+中,使用LEAD
和LAG
函数而不是自联接效率更高。
WITH
CTE
AS
(
SELECT
rowno
,txn_Date_Time
,txn_time
,accountNo
,LEAD(txn_time) OVER (PARTITION BY accountNo ORDER BY txn_time, rowno) AS next_time
,LAG(txn_time) OVER (PARTITION BY accountNo ORDER BY txn_time, rowno) AS prev_time
FROM T
)
SELECT
rowno
,txn_Date_Time
,txn_time
,accountNo
FROM CTE
WHERE
DATEDIFF(second, prev_time, txn_time) < 5 * 60
OR
DATEDIFF(second, txn_time, next_time) < 5 * 60
ORDER BY txn_time, rowno;
答案 1 :(得分:2)
因为您使用的是SQL 2012,所以可以使用Window Offset Functions,例如LAG和LEAD。然而,@ vladimir打败了我。他和我把类似的解决方案放在一起。
为了让事情变得有趣,我将演示如何优化您的查询,以便LAG和LEAD都不会导致SQL服务器需要排序来满足您的查询。我正在创建的索引类型称为 POC索引 ,后面会讨论here。
为简单起见,我使用txn_date_time的单列日期时间数据类型。我将创建两个相同的表并针对它们运行我的解决方案。第二个表上会有一个poc索引。
示例数据
-- sample data
if object_id('tempdb..#table') is not null drop table #table;
if object_id('tempdb..#table2') is not null drop table #table2;
go
create table #table
(
rowno int identity,
txn_date_time datetime,
accountNo int
);
create table #table2
(
rowno int identity,
txn_date_time datetime,
accountNo int
);
-- populate #table
declare @dt varchar(9) = '20171031 ', @acn int = 98989898;
insert #table (txn_date_time, accountNo)
values
(@dt+'11:50:47',@acn), (@dt+'11:52:23', @acn), (@dt+'11:52:23',@acn),
(@dt+'11:59:03',@acn), (@dt+'12:05:13', @acn), (@dt+'12:41:06',@acn),
(@dt+'12:42:44',@acn), (@dt+'12:44:02', @acn), (@dt+'15:23:19',@acn),(@dt+'16:19:17',@acn);
-- populate #table2
insert #table2 (txn_date_time, accountNo)
select txn_date_time, accountNo from #table;
-- create unique clustered index on #table2
create unique clustered index uq_cl_table2 on #table2(txn_date_time, rowno);
GO
对两个表运行相同的查询,记住第二个表上有poc索引。
-- #table
select rowno, txn_date_time, accountNo
from
(
select rowno, txn_date_time, accountNo,
nextDt = datediff(minute, txn_date_time, lead(txn_date_time, 1) over (order by txn_date_time)),
prevDt = datediff(minute, lag(txn_date_time, 1) over (order by txn_date_time), txn_date_time)
from #table
) fixedDates
where nextDt <= 5 or prevDt <= 5;
-- #table2
select rowno, txn_date_time, accountNo
from
(
select rowno, txn_date_time, accountNo,
nextDt = datediff(minute, txn_date_time, lead(txn_date_time, 1) over (order by txn_date_time)),
prevDt = datediff(minute, lag(txn_date_time, 1) over (order by txn_date_time), txn_date_time)
from #table2
) fixedDates
where nextDt <= 5 or prevDt <= 5;
请注意执行计划。添加poc索引会删除排序并使查询的效率提高四倍。
答案 2 :(得分:1)
尝试自联接以附加上一行,然后联合第二个自我加入上一行的查询:
SELECT
t1.rowno
,t1.txn_Date_time
,t1.txn_time
,t1.accountNo
FROM [table] t1
JOIN [table] t2
ON t2.rowno = t1.rowno + 1
WHERE DATEDIFF(MINUTE, t1.txn_time, t2.txn_time) < 5
UNION
SELECT
t1.rowno
,t1.txn_Date_time
,t1.txn_time
,t1.accountNo
FROM [table] t1
JOIN [table] t2
ON t2.rowno = t1.rowno - 1
WHERE DATEDIFF(MINUTE, t2.txn_time, t1.txn_time) < 5