(这是基于尝试将Integer RDD映射到TholdDropResult RDD,但我们需要初始化单个SparkDoDrop以生成所有(10 ^ 8)TholdDropResults,因此使用mapPartitionsWithIndex ,mapPartition中Java的唯一风格,它将提供我们需要的函数类型,例如。)
问题:我使用org.apache.spark.api.java.function.Function2
我无法弄清楚如何使用" boolean"进入new Function2
当我尝试使用此代码时,向右滚动以查看似乎给我带来麻烦的new Function2
声明(从回答中添加了构建器样式的格式):
JavaRDD<TholdDropResult> dropResultsN = dataSetN.mapPartitionsWithIndex(
new Function2<Integer,
Iterator<Integer>,
Iterator<TholdDropResult>>(){
@Override
public Iterator<TholdDropResult> call(Integer partitionID, Iterator<Integer> integerIterator) throws Exception {
//
SparkDoDrop standin = makeNewSparkDoDrop();
standin.initializeLI();
List<TholdDropResult> rddToReturn = new ArrayList<>();
while (integerIterator.hasNext()){
rddToReturn.add(standin.call(integerIterator.next()));
}
return rddToReturn.iterator();
}});
dropResultsN.persist(StorageLevel.MEMORY_ONLY());
这是我运行gradle build
时的完整错误:
JavaRDD<TholdDropResult> dropResultsN = dataSetN.mapPartitionsWithIndex(new Function2<Integer, Iterator<Integer>, Iterator<TholdDropResult>>(){
required: Function2<Integer,Iterator<Integer>,Iterator<R>>,boolean
found: <anonymous Function2<Integer,Iterator<Integer>,Iterator<TholdDropResult>>>
reason: cannot infer type-variable(s) R
(actual and formal argument lists differ in length)
where R,T,This are type-variables:
R extends Object declared in method <R>mapPartitionsWithIndex(Function2<Integer,Iterator<T>,Iterator<R>>,boolean)
T extends Object declared in class AbstractJavaRDDLike
This extends JavaRDDLike<T,This> declared in class AbstractJavaRDDLike
当我尝试将布尔arg放在那里时:
new Function2<Integer, Iterator<Integer>, Iterator<TholdDropResult>, Boolean>()
我收到一个错误:
error: wrong number of type arguments; required 3
JavaRDD<TholdDropResult> dropResultsN = dataSetN.mapPartitionsWithIndex(new Function2<Integer, Iterator<Integer>, Iterator<TholdDropResult>, Boolean>(){
最后,如果我使用boolean
代替Boolean
,我会收到另一个错误:
error: unexpected type
JavaRDD<TholdDropResult> dropResultsN = dataSetN.mapPartitionsWithIndex(new Function2<Integer, Iterator<Integer>, Iterator<TholdDropResult>, boolean>(){
^
required: reference
found: boolean
error: wrong number of type arguments; required 3
JavaRDD<TholdDropResult> dropResultsN = dataSetN.mapPartitionsWithIndex(new Function2<Integer, Iterator<Integer>, Iterator<TholdDropResult>, boolean>(){
答案 0 :(得分:1)
您需要在Function2
之前关闭>
并附加Boolean
:
JavaRDD<TholdDropResult> dropResultsN =
dataSetN.mapPartitionsWithIndex(new Function2<Integer,
Iterator<Integer>,
Iterator<TholdDropResult>>, Boolean>
mapPartitionsWithIndex
的签名如下所示:
<R> JavaRDD<R> mapPartitionsWithIndex(Function2<java.lang.Integer,
java.util.Iterator<T>,
java.util.Iterator<R>> f,
boolean preservesPartitioning)
Function2
需要Integer
和Iterator<T>
并返回Iterator<R>
。预期的boolean
是未在Function2
内定义的参数。
答案 1 :(得分:0)
这有效,不知道为什么,但是分离出Function2可以解决问题(当然我还没有编译和运行)。
Function2 makeLIThenDropResults = new Function2<Integer,
Iterator<Integer>,
Iterator<TholdDropResult>>() {
@Override
public Iterator<TholdDropResult> call(Integer partitionID, Iterator<Integer> integerIterator) throws Exception {
SparkDoDrop standin = makeNewSparkDoDrop();
standin.initializeLI();
List<TholdDropResult> rddToReturn = new ArrayList<>();
while (integerIterator.hasNext()){
rddToReturn.add(standin.call(integerIterator.next()));
}
return rddToReturn.iterator();
}
};
// now make the RDD of subset of N
// setup bogus arrays of size N for parallelize to lead to dropResultsN
JavaRDD<TholdDropResult> dropResultsN = dataSetN.mapPartitionsWithIndex(makeLIThenDropResults, true);