我正在计算一个简单蒸汽的最大值,结果是:
(S1,1000,S1,值:999)
(S1,2000,S1,值:41)
最后一行数据显然晚了:new SensorReading("S1", 999, 100L)
为什么在第一个窗口(0-1000)中计算?
我认为应该在SensorReading("S1", 41, 1000L)
到达时触发第一个窗口。
我对这个结果感到非常困惑。
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.setParallelism(TrainingBase.parallelism);
DataStream<SensorReading> input = env.fromElements(
new SensorReading("S1", 35, 500L),
new SensorReading("S1", 42, 999L),
new SensorReading("S1", 41, 1000L),
new SensorReading("S1", 40, 1200L),
new SensorReading("S1", 23, 1400L),
new SensorReading("S1", 999, 100L)
);
input.assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks<SensorReading>() {
private long currentMaxTimestamp;
@Nullable
@Override
public Watermark getCurrentWatermark() {
return new Watermark(currentMaxTimestamp);
}
@Override
public long extractTimestamp(SensorReading element, long previousElementTimestamp) {
currentMaxTimestamp = element.ts;
return currentMaxTimestamp;
}
})
.keyBy((KeySelector<SensorReading, String>) value -> value.sensorName)
.window(TumblingEventTimeWindows.of(Time.seconds(1)))
.reduce(new MyReducingMax(), new MyWindowFunction())
.print();
env.execute();
MyReducingMax(),MyWindowFunction()
private static class MyReducingMax implements ReduceFunction<SensorReading> {
public SensorReading reduce(SensorReading r1, SensorReading r2) {
return r1.getValue() > r2.getValue() ? r1 : r2;
}
}
private static class MyWindowFunction extends
ProcessWindowFunction<SensorReading, Tuple3<String, Long, SensorReading>, String, TimeWindow> {
@Override
public void process(
String key,
Context context,
Iterable<SensorReading> maxReading,
Collector<Tuple3<String, Long, SensorReading>> out) {
SensorReading max = maxReading.iterator().next();
out.collect(new Tuple3<>(key, context.window().getEnd(), max));
}
}
public static class SensorReading {
String sensorName;
int value;
Long ts;
public SensorReading() {
}
public SensorReading(String sensorName, int value, Long ts) {
this.sensorName = sensorName;
this.value = value;
this.ts = ts;
}
public Long getTs() {
return ts;
}
public void setTs(Long ts) {
this.ts = ts;
}
public String getSensorName() {
return sensorName;
}
public void setSensorName(String sensorName) {
this.sensorName = sensorName;
}
public int getValue() {
return value;
}
public void setValue(int value) {
this.value = value;
}
public String toString() {
return this.sensorName + "(" + this.ts + ") value: " + this.value;
}
;
}
答案 0 :(得分:1)
AssignerWithPeriodicWatermarks不会在每个可能的机会上创建水印。相反,Flink会定期调用此类分配器以获取最新的水印,并且默认情况下,此操作每200毫秒(实时,而非事件时间)完成一次。此间隔由ExecutionConfig.setAutoWatermarkInterval(...)控制。
这意味着在可以调用水印分配器之前,几乎可以肯定已经处理了所有六个测试事件。
如果您想拥有更多可预测的水印,则可以改用AssignerWithPunctuatedWatermarks。
顺便说一句,水印分配器的书写方式,所有乱序都可能会延迟。更典型地使用BoundedOutOfOrdernessTimestampExtractor,它允许一些乱序。