在下面的简单Flink代码中,我有3个时间事件事件,每两个事件之间有1秒。他们服务于Flink无序:事件:2,1,3。 我注意到,当我更改timeWindowAll的参数时,我有时会发生所有3个事件的事件,有时只打印2个和3个。
一些例子:
.timeWindowAll(Time.seconds(3)) --> 2, 3
.timeWindowAll(Time.seconds(4)) --> 2, 1, 3
.timeWindowAll(Time.seconds(5)) --> 2, 3
.timeWindowAll(Time.seconds(6)) --> 2, 3
.timeWindowAll(Time.seconds(7)) --> 2, 1, 3
.timeWindowAll(Time.seconds(8)) --> 2, 1, 3
.timeWindowAll(Time.seconds(9)) --> 2, 1, 3
.timeWindowAll(Time.seconds(10)) --> 2, 3
...
有人能解释为什么会这样吗?我想这与窗口的开始时间有关,并且事件1已经晚了。所以在这种情况下,给予" size" to timeWindow所有,我怎么知道每个窗口的开始时间是什么?
object UnorederedTimeEvents {
case class MyEvent(timestamp: Long, str: String)
class MyAssignerWithPunctuatedWatermarks extends AssignerWithPunctuatedWatermarks[MyEvent] {
override def checkAndGetNextWatermark(lastElement: MyEvent, extractedTimestamp: Long): Watermark = new Watermark(extractedTimestamp)
override def extractTimestamp(element: MyEvent, previousElementTimestamp: Long): Long = element.timestamp
}
class MyProcessAllWindowFunction extends ProcessAllWindowFunction[MyEvent, MyEvent, TimeWindow] {
override def process(context: Context, elements: Iterable[MyEvent], out: Collector[MyEvent]): Unit = {
elements.foreach(out.collect)
}
}
def main(args: Array[String]): Unit = {
val events = List(MyEvent(1526056650167L, "2"), MyEvent(1526056649167L, "1"), MyEvent(1526056651167L, "3"))
println(events.sortBy(_.timestamp))
val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.setParallelism(1)
env
.fromCollection(events)
.assignTimestampsAndWatermarks(new MyAssignerWithPunctuatedWatermarks)
.timeWindowAll(Time.seconds(10))
.process(new MyProcessAllWindowFunction)
.print()
env.execute()
}
}