Apache Flink的迭代流不会循环

时间:2018-05-10 14:22:02

标签: mapreduce apache-flink flink-streaming connected-components

我正在使用Flink的DataStream API实现连通组件算法,因为还没有使用此API实现它。
对于这个算法,我通过翻滚窗口分离数据。因此,对于每个窗口,我都试图独立地计算算法。

我的问题来自算法的迭代特性。我实现了我想要的交互数据管道(步骤数据管道),它包括FlatMaps,1 Join,1 ProcessWindow和1 Filter。但是,似乎我想反馈循环的流实际上并没有反馈到循环的开头,因为算法不会迭代。我怀疑如果原始迭代数据流与另一个流连接(即使后者是由前者的flatMap创建),则不可能这样做。

我使用的代码如下:

    //neigborsList = Datastream of <Vertex, [List of neighbors], label>
IterativeStream< Tuple3<Integer, ArrayList<Integer>, Integer> > beginning_loop = neigborsList.iterate(maxTimeout);

//Emits tuples Vertices and Labels for every vertex and its neighbors

DataStream<Tuple2<Integer,Integer> > labels = beginning_loop
        //Datastream of <Vertex, label> for every neigborsList.f0 and element in neigborsList.f1
        .flatMap( new EmitVertexLabel() ) 
        .keyBy(0)
        .window(TumblingEventTimeWindows.of(Time.milliseconds(windowSize)))
        .minBy(1)               
        ;


DataStream<Tuple4<Integer, ArrayList<Integer>, Integer, Integer>> updatedVertex = beginning_loop                
            //Update vertex label with the results from the labels reduction
            .join(labels)
            .where("vertex")
            .equalTo("vertex")
            .window(TumblingEventTimeWindows.of(Time.milliseconds(windowSize)))
            .apply(new JoinFunction<Tuple3<Integer,ArrayList<Integer>,Integer>, Tuple2<Integer,Integer>, Tuple4<Integer,ArrayList<Integer>,Integer,Integer>>() {

                @Override
                public Tuple4<Integer,ArrayList<Integer>,Integer,Integer> join(
                        Tuple3<Integer, ArrayList<Integer>, Integer> arg0, Tuple2<Integer, Integer> arg1)
                        throws Exception {
                    int hasConverged = 1;
                    if(arg1.f1.intValue() < arg0.f2.intValue() )
                    {
                        arg0.f2 = arg1.f1;
                        hasConverged=0;
                    }
                    return new Tuple4<>(arg0.f0,arg0.f1,arg0.f2,new Integer(hasConverged));
                }                       

            })

            //Disseminates the convergence flag if a change was made in the window
            .windowAll(TumblingEventTimeWindows.of(Time.milliseconds(windowSize)))
            .process(new ProcessAllWindowFunction<Tuple4<Integer,ArrayList<Integer>,Integer,Integer>,Tuple4<Integer, ArrayList<Integer>, Integer, Integer>,TimeWindow >() {

                @Override
                public void process(
                        ProcessAllWindowFunction<Tuple4<Integer, ArrayList<Integer>, Integer, Integer>, Tuple4<Integer, ArrayList<Integer>, Integer, Integer>, TimeWindow>.Context ctx,
                        Iterable<Tuple4<Integer, ArrayList<Integer>, Integer, Integer>> values,
                        Collector<Tuple4<Integer, ArrayList<Integer>, Integer, Integer>> out) throws Exception {

                    Iterator<Tuple4<Integer, ArrayList<Integer>, Integer, Integer>> iterator = values.iterator();
                    Tuple4<Integer, ArrayList<Integer>, Integer, Integer> element;

                    int hasConverged= 1;
                    while(iterator.hasNext())
                    {
                        element = iterator.next();
                        if(element.f3.intValue()>0)
                        {
                            hasConverged=0;
                            break;
                        }

                    }

                    //Re iterate and emit the values on the correct output
                    iterator = values.iterator();                           
                    Integer converged = new Integer(hasConverged);
                    while(iterator.hasNext())
                    {
                        element = iterator.next();
                        element.f3 = converged;
                        out.collect(element);

                    }                                                   
                }
            })              

            ;


DataStream<Tuple3<Integer, ArrayList<Integer>, Integer>> feed_back = updatedVertex
        .filter(new NotConvergedFilter())                               
        //Remove the finished convergence flag
        //Transforms the Tuples4 to Tuples3 so that it becomes compatible with beginning_loop
        .map(new RemoveConvergeceFlag())
        ;


beginning_loop.closeWith(feed_back);

//Selects the windows that have already converged
DataStream<?> convergedWindows = updatedVertex
        .filter(new ConvergedFilter() );


convergedWindows.print()
.setParallelism(1)
.name("Sink to stdout");

在执行结束时,convergedWindows没有收到任何tupple(因为算法只能在1次迭代时收敛)。 如果我打印了begin_loop,我会看到初始tupples和来自第一次迭代的feed_back结果的tupples。但是,除此之外别无其他。

那么,总结一下我的问题,这可能是Flink的限制吗?如果是这样,您是否知道在初始缩减后更新顶点标签的另一种方法,一种不基于连接的方式?

PS。我正在使用Flink 1.3.3

0 个答案:

没有答案