Apache Flink-基于事件时间的水印生成器-最佳策略

时间:2019-01-30 11:20:43

标签: apache-flink flink-streaming

我是flink新手,正在尝试应用windowing。我的来源是kafka,而且我的模型不包含事件时间信息,因此,我想将Kafka时间戳与AssignTimestampsAndWatermarks()方法一起使用

我实现了两个时间戳分配器,如下所示。

public class TimestampAssigner1 implements AssignerWithPeriodicWatermarks<String> {
    protected Logger          logger = LoggerFactory.getLogger(getClass());

    private static final long serialVersionUID = 1L;
    private long currentMaxTimestamp;
    private final long maxOutOfOrderness = 3500; // 3.5 seconds

    @Override
    public long extractTimestamp(String element, long previousElementTimestamp) {
        currentMaxTimestamp = Math.max(previousElementTimestamp, currentMaxTimestamp);
        logger.info(String.format("TimestampAssigner1 - currentMaxTimestamp : %s res : %s, element : %s ", currentMaxTimestamp, previousElementTimestamp, element));
        return previousElementTimestamp;
    }

    @Override
    public Watermark getCurrentWatermark() {
        Watermark watermarkRes = new Watermark(currentMaxTimestamp - maxOutOfOrderness);
        //Watermark watermarkRes = new Watermark(currentMaxTimestamp );
        //Watermark watermarkRes = new Watermark(System.currentTimeMillis() );
        //logger.info(String.format("watermarkRes : %s , this : %s ", watermarkRes, this));
        return watermarkRes;
    }
}


public class TimestampAssigner2 implements AssignerWithPeriodicWatermarks<String> {
    protected Logger          logger = LoggerFactory.getLogger(getClass());

    private static final long serialVersionUID = 1L;
    private long currentMaxTimestamp;
    private final long maxOutOfOrderness = 3500; // 3.5 seconds

    @Override
    public long extractTimestamp(String element, long previousElementTimestamp) {
        currentMaxTimestamp = Math.max(previousElementTimestamp, currentMaxTimestamp);
        logger.info(String.format("TimestampAssigner2 - currentMaxTimestamp : %s res : %s, element : %s ", currentMaxTimestamp, previousElementTimestamp, element));
        return previousElementTimestamp;
    }

    @Override
    public Watermark getCurrentWatermark() {
        //Watermark watermarkRes = new Watermark(currentMaxTimestamp - maxOutOfOrderness);
        //Watermark watermarkRes = new Watermark(currentMaxTimestamp );
        Watermark watermarkRes = new Watermark(System.currentTimeMillis() );
        //logger.info(String.format("watermarkRes : %s , this : %s ", watermarkRes, this));
        return watermarkRes;
    }
}

这就是我观察到的:如果没有来自kafka的新元素,第一个(TimestampAssigner1)将无法进行。我实际上可以验证此行为,可以接收到元素,但是在缺少新元素的情况下窗口无法完成。 第二个(TimestampAssigner2)似乎进展顺利,但是据我了解,由于我使用系统时间,因此延迟元素将不会被处理,因为它们将不会包含在Windows中。

应对这种情况的正确方法应该是什么?我的要求是及时处理所有事件。

致谢

0 个答案:

没有答案