我有如下示例数据
Id UserId EventDate EventId EventTime Fg RN
1 1 2018-10-15 6 2018-10-15 12:10:10.000 0 8
2 1 2018-10-15 6 2018-10-15 12:10:11.000 0 7
3 1 2018-10-15 6 2018-10-15 12:10:12.000 1 6
4 1 2018-10-15 6 2018-10-15 12:10:13.000 1 5
5 1 2018-10-15 6 2018-10-15 12:10:15.000 0 4
6 1 2018-10-15 6 2018-10-15 12:10:17.000 0 3
7 1 2018-10-15 6 2018-10-15 12:10:20.000 1 2
8 1 2018-10-15 6 2018-10-15 12:10:25.000 1 1
9 1 2018-10-16 8 2018-10-16 12:12:33.000 0 3
10 1 2018-10-16 8 2018-10-16 12:12:43.000 0 2
11 1 2018-10-16 8 2018-10-16 12:12:47.000 1 1
12 1 2018-10-17 9 2018-10-17 12:15:10.000 0 4
13 1 2018-10-17 9 2018-10-17 12:15:15.000 0 3
14 1 2018-10-17 9 2018-10-17 12:15:18.000 1 2
15 1 2018-10-17 9 2018-10-17 12:15:25.000 1 1
我要选择以下行
Id UserId EventDate EventId EventTime Fg RN
7 1 2018-10-15 6 2018-10-15 12:10:20.000 1 2
11 1 2018-10-16 8 2018-10-16 12:12:47.000 1 1
14 1 2018-10-17 9 2018-10-17 12:15:18.000 1 2
这些行由第一Fg
列后面的下一行(从末尾开始为0)标识。 RN
列为1标识结束。
为进一步说明,RN
列是从末尾开始的每一天每个UserId的事件顺序。
我正在使用的解决方案是向后遍历递归CTE,但这对于我拥有的数据量来说相当慢。
有哪些替代方法?
这是DDL
CREATE TABLE dbo.test (Id INT IDENTITY(1,1), UserId INT, EventDate DATE, EventId INT, EventTime DATETIME NOT NULL, Fg BIT, RN BIGINT)
GO
INSERT INTO dbo.Test(UserId, EventDate, EventTime, Fg, RN, EventId )
VALUES
(1, '20181015','20181015 12:10:10', 0, 8, 6)
,(1, '20181015','20181015 12:10:11', 0, 7, 6)
,(1, '20181015','20181015 12:10:12', 1, 6, 6)
,(1, '20181015','20181015 12:10:13', 1, 5, 6)
,(1, '20181015','20181015 12:10:15', 0, 4, 6)
,(1, '20181015','20181015 12:10:17', 0, 3, 6)
,(1, '20181015','20181015 12:10:20', 1, 2, 6)
,(1, '20181015','20181015 12:10:25', 1, 1, 6)
,(1, '20181016','20181016 12:12:33', 0, 3, 8)
,(1, '20181016','20181016 12:12:43', 0, 2, 8)
,(1, '20181016','20181016 12:12:47', 1, 1, 8)
,(1, '20181017','20181017 12:15:10', 0, 4, 9)
,(1, '20181017','20181017 12:15:15', 0, 3, 9)
,(1, '20181017','20181017 12:15:18', 1, 2, 9)
,(1, '20181017','20181017 12:15:25', 1, 1, 9)
GO
答案 0 :(得分:3)
您可以使用窗口功能在此处帮助标识正确的记录:
SELECT *
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY EventID, LagCheck ORDER BY RN) lagCheckRow
FROM
(
SELECT test.*, lag(Fg) OVER (PARTITION BY EventID ORDER BY RN DESC) as lagcheck
FROM test
) t1
) t2
WHERE lagCheckRow = 1 AND lagCheck = 0;
+----+--------+---------------------+---------+---------------------+------+----+----------+-------------+
| Id | UserId | EventDate | EventId | EventTime | Fg | RN | lagcheck | lagCheckRow |
+----+--------+---------------------+---------+---------------------+------+----+----------+-------------+
| 7 | 1 | 15.10.2018 00:00:00 | 6 | 15.10.2018 12:10:20 | True | 2 | 0 | 1 |
| 11 | 1 | 16.10.2018 00:00:00 | 8 | 16.10.2018 12:12:47 | True | 1 | 0 | 1 |
| 14 | 1 | 17.10.2018 00:00:00 | 9 | 17.10.2018 12:15:18 | True | 2 | 0 | 1 |
+----+--------+---------------------+---------+---------------------+------+----+----------+-------------+
最里面的查询使用Lag()
获取上一条记录的Fg
。我们在这里需要一个0
(在最外面的查询中过滤)。然后,我们击中Row_Number()
,以从lag()结果中获得订单。我们保留那个顺序= 1的那个。
这应该比递归cte快得多,因为它只需对数据(初始结果集,窗口函数运行,下一个窗口函数运行,最终WHERE谓词)进行大约4次滑动即可。
快速说明:如果此表有多个userid
(肯定有),并且要针对多个表运行,则将userid
添加到每个PARTITION BY
中确保您不会得到古怪的答案。