我是flink新手,正在尝试应用windowing。我的来源是kafka,而且我的模型不包含事件时间信息,因此,我想将Kafka时间戳与AssignTimestampsAndWatermarks()方法一起使用
我实现了两个时间戳分配器,如下所示。
public class TimestampAssigner1 implements AssignerWithPeriodicWatermarks<String> {
protected Logger logger = LoggerFactory.getLogger(getClass());
private static final long serialVersionUID = 1L;
private long currentMaxTimestamp;
private final long maxOutOfOrderness = 3500; // 3.5 seconds
@Override
public long extractTimestamp(String element, long previousElementTimestamp) {
currentMaxTimestamp = Math.max(previousElementTimestamp, currentMaxTimestamp);
logger.info(String.format("TimestampAssigner1 - currentMaxTimestamp : %s res : %s, element : %s ", currentMaxTimestamp, previousElementTimestamp, element));
return previousElementTimestamp;
}
@Override
public Watermark getCurrentWatermark() {
Watermark watermarkRes = new Watermark(currentMaxTimestamp - maxOutOfOrderness);
//Watermark watermarkRes = new Watermark(currentMaxTimestamp );
//Watermark watermarkRes = new Watermark(System.currentTimeMillis() );
//logger.info(String.format("watermarkRes : %s , this : %s ", watermarkRes, this));
return watermarkRes;
}
}
public class TimestampAssigner2 implements AssignerWithPeriodicWatermarks<String> {
protected Logger logger = LoggerFactory.getLogger(getClass());
private static final long serialVersionUID = 1L;
private long currentMaxTimestamp;
private final long maxOutOfOrderness = 3500; // 3.5 seconds
@Override
public long extractTimestamp(String element, long previousElementTimestamp) {
currentMaxTimestamp = Math.max(previousElementTimestamp, currentMaxTimestamp);
logger.info(String.format("TimestampAssigner2 - currentMaxTimestamp : %s res : %s, element : %s ", currentMaxTimestamp, previousElementTimestamp, element));
return previousElementTimestamp;
}
@Override
public Watermark getCurrentWatermark() {
//Watermark watermarkRes = new Watermark(currentMaxTimestamp - maxOutOfOrderness);
//Watermark watermarkRes = new Watermark(currentMaxTimestamp );
Watermark watermarkRes = new Watermark(System.currentTimeMillis() );
//logger.info(String.format("watermarkRes : %s , this : %s ", watermarkRes, this));
return watermarkRes;
}
}
这就是我观察到的:如果没有来自kafka的新元素,第一个(TimestampAssigner1)将无法进行。我实际上可以验证此行为,可以接收到元素,但是在缺少新元素的情况下窗口无法完成。 第二个(TimestampAssigner2)似乎进展顺利,但是据我了解,由于我使用系统时间,因此延迟元素将不会被处理,因为它们将不会包含在Windows中。
应对这种情况的正确方法应该是什么?我的要求是及时处理所有事件。
致谢