Apache flink加入

时间:2018-02-22 00:36:41

标签: apache-flink flink-streaming

在Apache flink中,我有2个Tuple8<>流说进出。 事件元组的8个字段中的4个(元组4)充当关键字。我想执行两个流之间存在的记录的相关性,作为这一步,我使用join运算符连接2个流。根据语义,我应该得到包含内部连接记录的输出流。但是,我没有得到任何输出或匹配。 env的时序特性设置为事件时间戳,元组的第一个元素是时间戳,我提取并使用assign将其标记为时间戳

DataStream<String> input = env.readTextFile("/tmp/logScrape/out/raw-input.out");
DataStream<Tuple8<Long, Integer, String, Integer, String, Integer, Integer, String>> inFiltered =
                input.flatMap(new Splitter())
                        .filter(new InFilter())
                        .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<Tuple8<Long, Integer, String, Integer, String, Integer, Integer, String>>(Time.seconds(10)) {
                            @Override
                            public long extractTimestamp(Tuple8<Long, Integer, String, Integer, String, Integer, Integer, String> record) {
                                return record.f0;
                            }
                        });
        DataStream<Tuple8<Long, Integer, String, Integer, String, Integer, Integer, String>> exitFiltered =
                input.flatMap(new Splitter())
                        .filter(new ExitFilter())
                        .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<Tuple8<Long, Integer, String, Integer, String, Integer, Integer, String>>(Time.seconds(10)) {
                            @Override
                            public long extractTimestamp(Tuple8<Long, Integer, String, Integer, String, Integer, Integer, String> record) {
                                return record.f0;
                            }
                        });

inFiltered.join(exitFiltered)
                .where(new TupleKeySelector())
                .equalTo(new TupleKeySelector())
                .window(TumblingEventTimeWindows.of(Time.milliseconds(1000000)))
                .apply(new StreamJoinner())
                .writeAsText("/tmp/logScrape/out/output");

 public static class TupleKeySelector implements KeySelector<
          Tuple8<Long, Integer, String, Integer, String, Integer, Integer, String>, 
          Tuple4<String, Integer, String, Integer>> {
        @Override
        public Tuple4<String, Integer, String, Integer> getKey(Tuple8<Long, Integer, String, Integer, String, Integer, Integer, String> value) {
            return new Tuple4<>(value.f2, value.f3, value.f4, value.f5);
        }
    }

以下是我为inFiltered

获得的输出记录
(1519254461076381189,1234,program1,11,program2,20,27,in)
(1519254462071697685,1234,program1,11,program2,20,27,in)
(1519254463067014246,1234,program1,11,program2,20,27,in)

以下是我为exitFiltered

获得的输出记录
(1519254458167640292,6789,program1,11,program2,20,27,out)
(1519254460158076301,6789,program1,11,program2,20,27,out)
(1519254461153294238,6789,program1,11,program2,20,27,out)
(1519254462148512207,6789,program1,11,program2,20,27,out)
(1519254463143730191,6789,program1,11,program2,20,27,out)

问题:

  • 我在这里缺少什么东西,以便我开始看到加入的结果?
  • 有没有办法在处理过程中调试代码?如果其关键选择器出现问题或窗口没有正常发生,我不确定我的情况。

1 个答案:

答案 0 :(得分:0)

你有一个100万毫秒的翻滚窗口,对吧?从查看两个过滤流的时间戳(第一个字段,对吧?),我看不到在同一个1M毫秒内发生的任何时间戳。