我想创建一个RDD来收集迭代计算的结果。
如何使用循环(或任何替代方法)替换以下代码:
import org.apache.spark.mllib.random.RandomRDDs._
val n = 10
val step1 = normalRDD(sc, n, seed = 1 )
val step2 = normalRDD(sc, n, seed = (step1.max).toLong )
val result1 = step1.zip(step2)
val step3 = normalRDD(sc, n, seed = (step2.max).toLong )
val result2 = result1.zip(step3)
...
val step50 = normalRDD(sc, n, seed = (step49.max).toLong )
val result49 = result48.zip(step50)
(创建N步RDD并在最后一起压缩也可以,只要迭代创建50个RDD以尊重种子=(步骤(n-1).max)条件)
答案 0 :(得分:6)
递归函数可以工作:
/**
* The return type is an Option to handle the case of a user specifying
* a non positive number of steps.
*/
def createZippedNormal(sc : SparkContext,
numPartitions : Int,
numSteps : Int) : Option[RDD[Double]] = {
@scala.annotation.tailrec
def accum(sc : SparkContext,
numPartitions : Int,
numSteps : Int,
currRDD : RDD[Double],
seed : Long) : RDD[Double] = {
if(numSteps <= 0) currRDD
else {
val newRDD = normalRDD(sc, numPartitions, seed)
accum(sc, numPartitions, numSteps - 1, currRDD.zip(newRDD), newRDD.max)
}
}
if(numSteps <= 0) None
else Some(accum(sc, numPartitions, numSteps, sc.emptyRDD[Double], 1L))
}