无序时间事件的窗口

时间:2018-05-20 17:53:29

标签: apache-flink flink-streaming

在下面的简单Flink代码中,我有3个时间事件事件,每两个事件之间有1秒。他们服务于Flink无序:事件:2,1,3。 我注意到,当我更改timeWindowAll的参数时,我有时会发生所有3个事件的事件,有时只打印2个和3个。

一些例子:

.timeWindowAll(Time.seconds(3)) --> 2, 3
.timeWindowAll(Time.seconds(4)) --> 2, 1, 3
.timeWindowAll(Time.seconds(5)) --> 2, 3
.timeWindowAll(Time.seconds(6)) --> 2, 3
.timeWindowAll(Time.seconds(7)) --> 2, 1, 3
.timeWindowAll(Time.seconds(8)) --> 2, 1, 3
.timeWindowAll(Time.seconds(9)) --> 2, 1, 3
.timeWindowAll(Time.seconds(10)) --> 2, 3
...

有人能解释为什么会这样吗?我想这与窗口的开始时间有关,并且事件1已经晚了。所以在这种情况下,给予" size" to timeWindow所有,我怎么知道每个窗口的开始时间是什么?

object UnorederedTimeEvents {
  case class MyEvent(timestamp: Long, str: String)
  class MyAssignerWithPunctuatedWatermarks extends AssignerWithPunctuatedWatermarks[MyEvent] {

    override def checkAndGetNextWatermark(lastElement: MyEvent, extractedTimestamp: Long): Watermark = new Watermark(extractedTimestamp)

    override def extractTimestamp(element: MyEvent, previousElementTimestamp: Long): Long = element.timestamp
  }

  class MyProcessAllWindowFunction extends ProcessAllWindowFunction[MyEvent, MyEvent, TimeWindow] {
    override def process(context: Context, elements: Iterable[MyEvent], out: Collector[MyEvent]): Unit = {
      elements.foreach(out.collect)
    }
  }

  def main(args: Array[String]): Unit = {

    val events = List(MyEvent(1526056650167L, "2"), MyEvent(1526056649167L, "1"), MyEvent(1526056651167L, "3"))

    println(events.sortBy(_.timestamp))

    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    env.setParallelism(1)
    env
      .fromCollection(events)
      .assignTimestampsAndWatermarks(new MyAssignerWithPunctuatedWatermarks)
      .timeWindowAll(Time.seconds(10))
      .process(new MyProcessAllWindowFunction)
      .print()

    env.execute()

  }

}

0 个答案:

没有答案