使用事件时间和时间戳分配器时,Flink窗口连接不起作用

时间:2017-10-24 12:54:39

标签: apache-flink flink-streaming

我刚遇到一个非常奇怪的问题,当使用带有时间戳和水印分配器的 EventTime 时,我无法从流窗口连接中获得任何结果。

我正在使用Kafka作为我的数据流源,并尝试了 AscendingTimestampExtractor 和自定义分配器,它们实现了Flink documentation here提到的 AssignerWithPeriodicWatermarks ,以及我所拥有的经过测试,没有水印,也没有生成连接结果。如果我更改为使用 ProcessingTime TumblingProcessingTimeWindows 而没有任何时间戳分配器,那么我可以得到正确的结果。

自定义时间戳和水印分配器的代码如下:

FlinkKafkaConsumer09<String> myConsumer1 =
                new FlinkKafkaConsumer09<>(myTopic1, new SimpleStringSchema(), props);
myConsumer1.assignTimestampsAndWatermarks(new MyTimestampsAndWatermarks());

FlinkKafkaConsumer09<String> myConsumer2 =
                new FlinkKafkaConsumer09<>(myTopic2, new SimpleStringSchema(), props);
myConsumer2.assignTimestampsAndWatermarks(new MyTimestampsAndWatermarks());
...
public static class MyTimestampsAndWatermarks implements AssignerWithPeriodicWatermarks<String> {
        private long currentMaxTimestamp;
        @Override
        public long extractTimestamp(String element, long previousElementTimestamp) {
            long timestamp = myFunctionToGetMillisFromString(element);
            currentMaxTimestamp = Math.max(timestamp, currentMaxTimestamp);
            return timestamp;
        }
        @Override
        public Watermark getCurrentWatermark() {
            return new Watermark(currentMaxTimestamp - 1L);
        }
}
...
DataStream<myPOJO1> stream1 = env.addSource(myConsumer1).map(new MyMapper1());
DataStream<myPOJO2> stream2 = env.addSource(myConsumer2).map(new MyMapper2());
stream1.join(stream2)
    .where(new KeySelector1())
    .equalTo(new KeySelector2())
    .window(TumblingEventTimeWindows.of(Time.seconds(windowSize)))
    .apply(new JoinFunction<AdClick, GameCreate, TransferResult>() {...});

我的AscendingTimestampExtractor代码如下:

FlinkKafkaConsumer09<String> myConsumer1 =
                new FlinkKafkaConsumer09<>(myTopic1, new SimpleStringSchema(), props);
myConsumer1.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<String>() {
    @Override
    public long extractAscendingTimestamp(String element) {
        return myFunctionToGetMillisFromString(element);
    }
});

FlinkKafkaConsumer09<String> myConsumer2 =
                new FlinkKafkaConsumer09<>(myTopic2, new SimpleStringSchema(), props);
myConsumer2.assignTimestampsAndWatermarks(new AscendingTimestampExtractor<String>() {
    @Override
    public long extractAscendingTimestamp(String element) {
        return myFunctionToGetMillisFromString(element);
    }
});
...
DataStream<myPOJO1> stream1 = env.addSource(myConsumer1).map(new MyMapper1());
DataStream<myPOJO2> stream2 = env.addSource(myConsumer2).map(new MyMapper2());
stream1.join(stream2)
    .where(new KeySelector1())
    .equalTo(new KeySelector2())
    .window(TumblingEventTimeWindows.of(Time.seconds(windowSize)))
    .apply(new JoinFunction<AdClick, GameCreate, TransferResult>() {...});

感谢您的帮助!

2 个答案:

答案 0 :(得分:0)

myConsumer3 = myConsumer1.assign *** myConsumer4 = myConsumer2.assign ***

并使用myConsumer3 / myConsumer4,这将是正确的

答案 1 :(得分:0)

我遇到了同样的问题,这是一个非常愚蠢的错误,我找到了解决方法here

写时:

myConsumer1.assignTimestampsAndWatermarks(new MyTimestampsAndWatermarks());

它创建一个新的数据流,而不是修改该流,并且您没有将其存储在变量中。 所以底线是:

将其存储在新的数据流中,并将联接应用于此数据流(将为其分配这些时间戳和水印)。