Flink Streaming Windowing - 每个窗口的最后一个事件属于下一个窗口

时间:2016-11-06 10:56:40

标签: apache-flink flink-streaming windowing

我正在使用Flink 1.2-Snapshot。我的数据如下所示:

  • id = 25398102,sourceId = 1,ts = 2016-10-15 00:00:56,user = 14,value = 919
  • id = 25398185,sourceId = 1,ts = 2016-10-15 00:01:06,user = 14,value = 920
  • id = 25398210,sourceId = 1,ts = 2016-10-15 00:01:16,user = 14,value = 944
  • id = 25398235,sourceId = 1,ts = 2016-10-15 00:01:24,user = 3149,value = 944
  • id = 25398236,sourceId = 1,ts = 2016-10-15 00:01:25,user = 71,value = 955
  • id = 25398239,sourceId = 1,ts = 2016-10-15 00:01:26,user = 71,value = 955
  • id = 25398265,sourceId = 1,ts = 2016-10-15 00:01:36,user = 71,value = 955
  • id = 25398310,sourceId = 1,ts = 2016-10-15 00:02:16,user = 14,value = 960
  • id = 25398320,sourceId = 1,ts = 2016-10-15 00:02:26,user = 14,value = 1000

我正在运行以下代码来创建基于Windows的用户ID:

    stream.flatMap(new LogsParser())
            .assignTimestampsAndWatermarks(new MessageTimestampExtractor())
            .keyBy("sourceId")
            .window(GlobalWindows.create())
            .trigger(PurgingTrigger.of(new MySessionTrigger()))
            .apply(new SessionWindowFunction())
            .print();

MySession触发器查看收到的事件并检查用户ID以触发用户ID更改的窗口。 SessionWindowFunction只是在窗口外创建一个会话。

以下是创建的会话:

  1. 会话:

    • id = 25398102,sourceId = 1,ts = 2016-10-15 00:00:56,user = 14,value = 919
    • id = 25398185,sourceId = 1,ts = 2016-10-15 00:01:06,user = 14,value = 920
    • id = 25398210,sourceId = 1,ts = 2016-10-15 00:01:16,user = 14,value = 944
    • id = 25398235,sourceId = 1,ts = 2016-10-15 00:01:24,user = 3149,value = 944
  2. 会话:

    • id = 25398236,sourceId = 1,ts = 2016-10-15 00:01:25,user = 71,value = 955
    • id = 25398239,sourceId = 1,ts = 2016-10-15 00:01:26,user = 71,value = 955
    • id = 25398265,sourceId = 1,ts = 2016-10-15 00:01:36,user = 71,value = 955
    • id = 25398310,sourceId = 1,ts = 2016-10-15 00:02:16,user = 14,value = 960
  3. 会话:

    • id = 25398320,sourceId = 1,ts = 2016-10-15 00:02:26,user = 14,value = 1000
  4. 您可以看到的问题是,在每个会话中,最后一个事件实际上属于下一个窗口。由于最后一个事件已经在窗口中,因此触发窗口的决定是某种程度上迟了。

    如何在不考虑该窗口中的最后一个事件的情况下触发窗口?

1 个答案:

答案 0 :(得分:0)

一种想法是使用flatmap在用户ID更改时将标记插入流中。然后,只要看到其中一个标记,您的触发器功能就会触发,并且您的会话窗口功能可以过滤掉标记。