Question

我们从kafka读取数据，消息可以简化为Tuple2，这里String是键，Integer是类型（可以是1，2，3），类似，

('key001', 1)
('key001', 2)
('key001', 3)
('key001', 3)
('key002', 1)
('key002', 2)
('key003', 1)
('key004', 1)

我们希望在10分钟的时间内获取一些统计信息

计算类型为1的键
计算同时具有类型1和类型2的键
计算所有三种类型的钥匙

我已经尝试过下面的代码，似乎可以用，但是对我来说却是扭曲的，这是正确的方法吗？

我必须在这里使用两次时间窗口，因为使用一个时间窗口没有提供我想要的东西，我仍然不清楚它的工作方式，谁能解释将多个时间窗口应用于流时发生的情况？

SingleOutputStreamOperator<Tuple2<String, Long>> x = ds.keyBy(0)
        .timeWindow(Time.seconds(600))
        .process(new ProcessWindowFunction<Tuple2<String, Integer>, Tuple2<String, Long>, Tuple, TimeWindow>() {

            @Override
            public void process(Tuple key,
                    ProcessWindowFunction<Tuple2<String, Integer>, Tuple2<String, Long>, Tuple, TimeWindow>.Context ctx,
                    Iterable<Tuple2<String, Integer>> elements, Collector<Tuple2<String, Long>> out) throws Exception {

                boolean hasType1 = false;
                boolean hasType2 = false;
                boolean hasType3 = false;
                for (Tuple2<String, Integer> t2 : elements) {
                    if (t2.f1 == 1) {
                        if (!hasType1) {
                            hasType1 = true;
                        }
                    } else if (t2.f1 == 2) {
                        if (!hasType2) {
                            hasType2 = true;
                        }
                    } else if (t2.f1 == 3) {
                        if (!hasType3) {
                            hasType3 = true;
                        }
                    }
                    //
                    if (hasType1 && hasType2 && hasType3) {
                        break;
                    }
                }
                if (hasType1) {
                    out.collect(new Tuple2<>("hasType1",1L));
                    if (hasType2) {
                        out.collect(new Tuple2<>("hasType1_Type2",1L));
                        if (hasType3) {
                            out.collect(new Tuple2<>("hasType1_Type2_Type3",1L));
                        }
                    }
                }

            }

        });

x.keyBy(0).timeWindow(Time.seconds(600)).sum(1).map(new MapFunction<Tuple2<String, Long>, String>(){

    @Override
    public String map(Tuple2<String, Long> value) throws Exception {
        return value.f0 + " = " + value.f1;
    }

}).addSink(new BucketingSink<String>("hdfs://...")).setParallelism(1);

Flink，多个时间窗口？

0 个答案: