为什么Flink不会丢弃最新数据?

时间:2019-03-28 08:14:29

标签: apache-flink flink-streaming

我正在计算一个简单蒸汽的最大值,结果是:

(S1,1000,S1,值:999)

(S1,2000,S1,值:41)

最后一行数据显然晚了:new SensorReading("S1", 999, 100L)

为什么在第一个窗口(0-1000)中计算?

我认为应该在SensorReading("S1", 41, 1000L)到达时触发第一个窗口。

我对这个结果感到非常困惑。

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
        env.setParallelism(TrainingBase.parallelism);

        DataStream<SensorReading> input = env.fromElements(
                new SensorReading("S1", 35, 500L),
                new SensorReading("S1", 42, 999L),
                new SensorReading("S1", 41, 1000L),
                new SensorReading("S1", 40, 1200L),
                new SensorReading("S1", 23, 1400L),
                new SensorReading("S1", 999, 100L)
        );


        input.assignTimestampsAndWatermarks(new AssignerWithPeriodicWatermarks<SensorReading>() {
            private long currentMaxTimestamp;

            @Nullable
            @Override
            public Watermark getCurrentWatermark() {
                return new Watermark(currentMaxTimestamp);
            }

            @Override
            public long extractTimestamp(SensorReading element, long previousElementTimestamp) {
                currentMaxTimestamp = element.ts;
                return currentMaxTimestamp;
            }
        })
                .keyBy((KeySelector<SensorReading, String>) value -> value.sensorName)
                .window(TumblingEventTimeWindows.of(Time.seconds(1)))
                .reduce(new MyReducingMax(), new MyWindowFunction())
                .print();

        env.execute();

MyReducingMax(),MyWindowFunction()

private static class MyReducingMax implements ReduceFunction<SensorReading> {
        public SensorReading reduce(SensorReading r1, SensorReading r2) {
            return r1.getValue() > r2.getValue() ? r1 : r2;
        }
    }

private static class MyWindowFunction extends
            ProcessWindowFunction<SensorReading, Tuple3<String, Long, SensorReading>, String, TimeWindow> {

        @Override
        public void process(
                String key,
                Context context,
                Iterable<SensorReading> maxReading,
                Collector<Tuple3<String, Long, SensorReading>> out) {

            SensorReading max = maxReading.iterator().next();
            out.collect(new Tuple3<>(key, context.window().getEnd(), max));
        }
    }

    public static class SensorReading {
        String sensorName;
        int value;
        Long ts;

        public SensorReading() {
        }

        public SensorReading(String sensorName, int value, Long ts) {
            this.sensorName = sensorName;
            this.value = value;
            this.ts = ts;
        }

        public Long getTs() {
            return ts;
        }

        public void setTs(Long ts) {
            this.ts = ts;
        }

        public String getSensorName() {
            return sensorName;
        }

        public void setSensorName(String sensorName) {
            this.sensorName = sensorName;
        }

        public int getValue() {
            return value;
        }

        public void setValue(int value) {
            this.value = value;
        }

        public String toString() {

            return this.sensorName + "(" + this.ts + ") value: " + this.value;
        }

        ;
    }

1 个答案:

答案 0 :(得分:1)

AssignerWithPeriodicWatermarks不会在每个可能的机会上创建水印。相反,Flink会定期调用此类分配器以获取最新的水印,并且默认情况下,此操作每200毫秒(实时,而非事件时间)完成一次。此间隔由ExecutionConfig.setAutoWatermarkInterval(...)控制。

这意味着在可以调用水印分配器之前,几乎可以肯定已经处理了所有六个测试事件。

如果您想拥有更多可预测的水印,则可以改用AssignerWithPunctuatedWatermarks。

顺便说一句,水印分配器的书写方式,所有乱序都可能会延迟。更典型地使用BoundedOutOfOrdernessTimestampExtractor,它允许一些乱序。