我收到错误"这个RDD缺少SparkContext"当我调用转换或动作时

时间:2018-01-21 16:19:49

标签: scala apache-spark spark-streaming rdd

可能是一个基本问题,我对Spark / Scala很新。

所以我有一个Map[String, RDD[Int]]类型的变量。我无法使用for遍历此变量并对循环中的RDD执行任何操作,当我尝试在内部调用任何类型的操作/转换时会抛出错误。

我认为变量不是RDD,所以用Map迭代一个简单的for循环并不算作转换,所以我很困惑。这是代码的样子:

def trendingSets(pairRDD: RDD[(String, Int)]): Map[String, RDD[Int]] = {
    pairRDD
      .groupByKey()
      .mapValues(v => { this.sc.parallelize(v.toList) })
      .take(20)
      .toMap
 }

def main(args: Array[String]) {

    val sets = this.trendingSets(pairRDD)

    // inside this loop, no transformations or actions work.
    for((tag, rdd) <- sets) {
       // For instance this fails:
       // val x = rdd.collect()
    }
}

错误是

Exception in thread "main" org.apache.spark.SparkException: This RDD lacks a SparkContext. It could happen in the following cases: 
(1) RDD transformations and actions are NOT invoked by the driver, but inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
(2) When a Spark Streaming job recovers from checkpoint, this exception will be hit if a reference to an RDD not defined by the streaming job is used in DStream operations. For more information, See SPARK-13758.

任何帮助将不胜感激。 谢谢!

0 个答案:

没有答案