Apache Spark Function2,未正确声明

时间:2016-07-19 20:56:56

标签: java apache-spark map-function

(这是基于尝试将Integer RDD映射到TholdDropResult RDD,但我们需要初始化单个SparkDoDrop以生成所有(10 ^ 8)TholdDropResults,因此使用mapPartitionsWithIndex ,mapPartition中Java的唯一风格,它将提供我们需要的函数类型,例如。)

问题:我使用org.apache.spark.api.java.function.Function2

收到错误消息

我无法弄清楚如何使用" boolean"进入new Function2

当我尝试使用此代码时,向右滚动以查看似乎给我带来麻烦的new Function2声明(从回答中添加了构建器样式的格式):

JavaRDD<TholdDropResult> dropResultsN = dataSetN.mapPartitionsWithIndex(
                                      new Function2<Integer, 
                                      Iterator<Integer>, 
                                      Iterator<TholdDropResult>>(){

        @Override
        public Iterator<TholdDropResult> call(Integer partitionID, Iterator<Integer> integerIterator) throws Exception {
            //
            SparkDoDrop standin = makeNewSparkDoDrop();
            standin.initializeLI();
            List<TholdDropResult> rddToReturn = new ArrayList<>();
            while (integerIterator.hasNext()){
                rddToReturn.add(standin.call(integerIterator.next()));
            }
            return rddToReturn.iterator();

        }});
    dropResultsN.persist(StorageLevel.MEMORY_ONLY());

这是我运行gradle build时的完整错误:

JavaRDD<TholdDropResult> dropResultsN = dataSetN.mapPartitionsWithIndex(new Function2<Integer, Iterator<Integer>, Iterator<TholdDropResult>>(){
required: Function2<Integer,Iterator<Integer>,Iterator<R>>,boolean
  found: <anonymous Function2<Integer,Iterator<Integer>,Iterator<TholdDropResult>>>
  reason: cannot infer type-variable(s) R
    (actual and formal argument lists differ in length)
  where R,T,This are type-variables:
    R extends Object declared in method <R>mapPartitionsWithIndex(Function2<Integer,Iterator<T>,Iterator<R>>,boolean)
    T extends Object declared in class AbstractJavaRDDLike
    This extends JavaRDDLike<T,This> declared in class AbstractJavaRDDLike

当我尝试将布尔arg放在那里时: new Function2<Integer, Iterator<Integer>, Iterator<TholdDropResult>, Boolean>() 我收到一个错误:

error: wrong number of type arguments; required 3
            JavaRDD<TholdDropResult> dropResultsN = dataSetN.mapPartitionsWithIndex(new Function2<Integer, Iterator<Integer>, Iterator<TholdDropResult>, Boolean>(){

最后,如果我使用boolean代替Boolean,我会收到另一个错误:

error: unexpected type
            JavaRDD<TholdDropResult> dropResultsN = dataSetN.mapPartitionsWithIndex(new Function2<Integer, Iterator<Integer>, Iterator<TholdDropResult>, boolean>(){
                                                                                                                                                         ^
  required: reference
  found:    boolean

error: wrong number of type arguments; required 3
            JavaRDD<TholdDropResult> dropResultsN = dataSetN.mapPartitionsWithIndex(new Function2<Integer, Iterator<Integer>, Iterator<TholdDropResult>, boolean>(){

2 个答案:

答案 0 :(得分:1)

您需要在Function2之前关闭>并附加Boolean

JavaRDD<TholdDropResult> dropResultsN =
   dataSetN.mapPartitionsWithIndex(new Function2<Integer, 
                                                 Iterator<Integer>,
                                                 Iterator<TholdDropResult>>, Boolean>

mapPartitionsWithIndex的签名如下所示:

<R> JavaRDD<R> mapPartitionsWithIndex(Function2<java.lang.Integer,
                                                java.util.Iterator<T>,
                                                java.util.Iterator<R>> f,
                                                boolean preservesPartitioning)

Function2需要IntegerIterator<T>并返回Iterator<R>。预期的boolean是未在Function2内定义的参数。

答案 1 :(得分:0)

这有效,不知道为什么,但是分离出Function2可以解决问题(当然我还没有编译和运行)。

        Function2 makeLIThenDropResults = new Function2<Integer,
                                                        Iterator<Integer>,
                                                        Iterator<TholdDropResult>>() {
            @Override
            public Iterator<TholdDropResult> call(Integer partitionID, Iterator<Integer> integerIterator) throws Exception {
                SparkDoDrop standin = makeNewSparkDoDrop();

                standin.initializeLI();
                List<TholdDropResult> rddToReturn = new ArrayList<>();
                while (integerIterator.hasNext()){
                    rddToReturn.add(standin.call(integerIterator.next()));
                }
                return rddToReturn.iterator();
            }
        };

        // now make the RDD of subset of N
        // setup bogus arrays of size N for parallelize to lead to dropResultsN
        JavaRDD<TholdDropResult> dropResultsN = dataSetN.mapPartitionsWithIndex(makeLIThenDropResults, true);

(帽子提示为this answer on Apache Spark mapPartitionsWithIndex