从新的spark shell REPL会话运行以下方法时一切正常。但是,当我尝试编译包含此方法的类时,我得到以下错误
Error:(21, 50) value values is not a member of org.apache.spark.rdd.RDD[(Long, org.apache.spark.mllib.recommendation.Rating)]
val training = ratings.filter(x => x._1 < 6).values.repartition(numPartitions).persist
^
Error:(22, 65) value values is not a member of org.apache.spark.rdd.RDD[(Long, org.apache.spark.mllib.recommendation.Rating)]
val validation = ratings.filter(x => x._1 >= 6 && x._1 < 8).values.repartition(numPartitions).persist
^
Error:(23, 47) value values is not a member of org.apache.spark.rdd.RDD[(Long, org.apache.spark.mllib.recommendation.Rating)]
val test = ratings.filter(x => x._1 >= 8).values.persist
^
在这两种情况下我都使用Spark 1.0.1代码本身如下。
def createDataset(ratings: RDD[Tuple2[Long,Rating]]): List[RDD[Rating]] = {
val training = ratings.filter(x => x._1 < 6).values.repartition(numPartitions).persist
val validation = ratings.filter(x => x._1 >= 6 && x._1 < 8).values.repartition(numPartitions).persist
val test = ratings.filter(x => x._1 >= 8).values.persist
val numTraining = training.count
val numValidation = validation.count
val numTest = test.count
println(" Number Of Training ::: " + numTraining + " numValidation ::: " + numValidation + " ::: " + numTest)
List(training,validation,test)
}
它来自MLLib教程(稍微改编),不知道哪里出错了。
答案 0 :(得分:5)
你需要在代码中加上这一行
import org.apache.spark.SparkContext._
这会将隐式转化导入PairRDDFunctions
,这样您就可以调用values
。 spark REPL为你导入了这个,这就是你在解释器中没有看到错误的原因。具体来说,SparkContext
中的此功能可以进行转换。
implicit def rddToPairRDDFunctions[K, V](rdd: RDD[(K, V)])
(implicit kt: ClassTag[K], vt: ClassTag[V], ord: Ordering[K] = null) = {
new PairRDDFunctions(rdd)
}