有什么办法可以更新lambda函数中的RDD

时间:2019-04-04 10:47:12

标签: apache-spark spark-streaming apache-spark-2.0

我需要找到一些单词的数量,以使某些单词与上一批次有所区别。

示例我必须从流数据中查找字数。 计数为(test,jack)的单词 如果我的第一批中有单词(test 5),(kite 2),(jack 3),(pen 5) 第二批数据为(test 2),(jack 1),(kite 1),(pen 1) 第二批的输出应如下: (test 3),(jack 2) //test(5-3) and jack(2-1)

..........

JavaPairRDD<String, Integer> initialRDD=jsc.sparkContext().parallelizePairs(tuples1);
 JavaPairDStream<String,Integer> joinedDstream= pairstream.transformToPair(new Function<JavaPairRDD<String, Integer>, JavaPairRDD<String, Integer>>() {
           @Override
           public JavaPairRDD<String, Integer> call(JavaPairRDD<String, Integer> v1) throws Exception {

               JavaPairRDD<String, Integer> modRDD  = v1.join(initialRDD).mapToPair(new PairFunction<Tuple2<String, Tuple2<Integer, Integer>>, String, Integer>() {
                   @Override
                   public Tuple2<String, Integer> call(Tuple2<String, Tuple2<Integer, Integer>> stringTuple2Tuple2) throws Exception {
                       return new Tuple2<>(stringTuple2Tuple2._1(), (stringTuple2Tuple2._2()._2 - stringTuple2Tuple2._2()._2));
                   }
               });
               return modRDD;      
initialRDD=modRDD.  //how to do this as lambda expressions only allow final variable.              
expressions require final variable.
           }
       });

0 个答案:

没有答案