基于状态的ADLA事件分组

时间:2019-01-24 15:09:46

标签: azure-data-lake u-sql

我正在建立一个ADLA作业来计算事件的持续时间。事件是一组标记为bij“开始”和“结束”以及源的日志条目。 我需要一种将日志条目分组在一起的方法,以便可以使用窗口功能来计算所有需要的信息。 我目前有以下代码:

    @events =
    SELECT *
    FROM(
        VALUES
        ("AABBCCDDEEF1", "2019-01-01 00:00:00", "room1",  "start"),
        ("AABBCCDDEEF2", "2019-01-01 00:00:10", "room1", "additional information"),
        ("AABBCCDDEEF3", "2019-01-01 00:00:20", "room1", "additional information"),
        ("AABBCCDDEEF4", "2019-01-01 00:00:30", "room1", "additional information"),
        ("AABBCCDDEEF5", "2019-01-01 00:00:40", "room1", "accepted" ),
        ("AABBCCDDEEF6", "2019-01-01 00:00:50", "room1", "end"),
        ("AABBCCDDEEF7", "2019-01-01 00:01:00", "room1", "start"),
        ("AABBCCDDEEF8", "2019-01-01 00:01:10", "room1", "additional information"),
        ("AABBCCDDEEF9", "2019-01-01 00:01:20", "room1", "additional information"),
        ("AABBCCDDEEFB", "2019-01-01 00:01:30", "room1", "accepted"),
        ("AABBCCDDEEFC", "2019-01-01 00:01:40", "room1", "end")
    ) AS T(id, eventDate, source, message);

    @result =
        SELECT 
            *,
            SUM(message.ToLower().Contains("start") ? 1 : 0) OVER(PARTITION BY source) AS groupId
        FROM @events
        WHERE (
            message.ToLower().Contains("start") 
            OR message.ToLower().Contains("accepted") 
            OR message.ToLower().Contains("end")
        );

    OUTPUT @result
    TO "/output/result.csv"
    USING Outputters.Csv();

所以我要求输出变成类似这样的东西:

    "AABBCCDDEEF1","2019-01-01 00:00:00","room1","start",1
    "AABBCCDDEEF5","2019-01-01 00:00:40","room1","accepted",1
    "AABBCCDDEEF6","2019-01-01 00:00:50","room1","end",1
    "AABBCCDDEEF7","2019-01-01 00:01:00","room1","start",2
    "AABBCCDDEEFB","2019-01-01 00:01:30","room1","accepted",2
    "AABBCCDDEEFC","2019-01-01 00:01:40","room1","end",2

之后,我可以做类似的事情

    @timeDiff =
        SELECT 
            *
            LEAD(date, 1) OVER(PARTITION BY groupId) nextDate
        FROM @result

请注意,我是ADLA的完全菜鸟,所以也欢迎采取其他方法的任何建议。

0 个答案:

没有答案