可能是一个基本问题,我对Spark / Scala很新。
所以我有一个Map[String, RDD[Int]]
类型的变量。我无法使用for
遍历此变量并对循环中的RDD执行任何操作,当我尝试在内部调用任何类型的操作/转换时会抛出错误。
我认为变量不是RDD,所以用Map
迭代一个简单的for循环并不算作转换,所以我很困惑。这是代码的样子:
def trendingSets(pairRDD: RDD[(String, Int)]): Map[String, RDD[Int]] = {
pairRDD
.groupByKey()
.mapValues(v => { this.sc.parallelize(v.toList) })
.take(20)
.toMap
}
def main(args: Array[String]) {
val sets = this.trendingSets(pairRDD)
// inside this loop, no transformations or actions work.
for((tag, rdd) <- sets) {
// For instance this fails:
// val x = rdd.collect()
}
}
错误是
Exception in thread "main" org.apache.spark.SparkException: This RDD lacks a SparkContext. It could happen in the following cases:
(1) RDD transformations and actions are NOT invoked by the driver, but inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
(2) When a Spark Streaming job recovers from checkpoint, this exception will be hit if a reference to an RDD not defined by the streaming job is used in DStream operations. For more information, See SPARK-13758.
任何帮助将不胜感激。 谢谢!