我正在建立一个ADLA作业来计算事件的持续时间。事件是一组标记为bij“开始”和“结束”以及源的日志条目。 我需要一种将日志条目分组在一起的方法,以便可以使用窗口功能来计算所有需要的信息。 我目前有以下代码:
@events =
SELECT *
FROM(
VALUES
("AABBCCDDEEF1", "2019-01-01 00:00:00", "room1", "start"),
("AABBCCDDEEF2", "2019-01-01 00:00:10", "room1", "additional information"),
("AABBCCDDEEF3", "2019-01-01 00:00:20", "room1", "additional information"),
("AABBCCDDEEF4", "2019-01-01 00:00:30", "room1", "additional information"),
("AABBCCDDEEF5", "2019-01-01 00:00:40", "room1", "accepted" ),
("AABBCCDDEEF6", "2019-01-01 00:00:50", "room1", "end"),
("AABBCCDDEEF7", "2019-01-01 00:01:00", "room1", "start"),
("AABBCCDDEEF8", "2019-01-01 00:01:10", "room1", "additional information"),
("AABBCCDDEEF9", "2019-01-01 00:01:20", "room1", "additional information"),
("AABBCCDDEEFB", "2019-01-01 00:01:30", "room1", "accepted"),
("AABBCCDDEEFC", "2019-01-01 00:01:40", "room1", "end")
) AS T(id, eventDate, source, message);
@result =
SELECT
*,
SUM(message.ToLower().Contains("start") ? 1 : 0) OVER(PARTITION BY source) AS groupId
FROM @events
WHERE (
message.ToLower().Contains("start")
OR message.ToLower().Contains("accepted")
OR message.ToLower().Contains("end")
);
OUTPUT @result
TO "/output/result.csv"
USING Outputters.Csv();
所以我要求输出变成类似这样的东西:
"AABBCCDDEEF1","2019-01-01 00:00:00","room1","start",1
"AABBCCDDEEF5","2019-01-01 00:00:40","room1","accepted",1
"AABBCCDDEEF6","2019-01-01 00:00:50","room1","end",1
"AABBCCDDEEF7","2019-01-01 00:01:00","room1","start",2
"AABBCCDDEEFB","2019-01-01 00:01:30","room1","accepted",2
"AABBCCDDEEFC","2019-01-01 00:01:40","room1","end",2
之后,我可以做类似的事情
@timeDiff =
SELECT
*
LEAD(date, 1) OVER(PARTITION BY groupId) nextDate
FROM @result
请注意,我是ADLA的完全菜鸟,所以也欢迎采取其他方法的任何建议。