foreach循环中的Spark NullPointerException

时间:2014-10-27 07:12:38

标签: scala foreach nullpointerexception apache-spark rdd

我有RDD,我想循环它。我喜欢这个:

pointsMap.foreach({ p =>
  val pointsWithCoordinatesWithDistance = pointsMap.leftOuterJoin(xCoordinatesWithDistance)
  pointsWithCoordinatesWithDistance.foreach(println)
  println("---")
})

但是,发生了NullPointerException:

java.lang.NullPointerException
    at org.apache.spark.rdd.RDD.<init>(RDD.scala:125)
    at org.apache.spark.rdd.CoGroupedRDD.<init>(CoGroupedRDD.scala:69)
    at org.apache.spark.rdd.PairRDDFunctions.cogroup(PairRDDFunctions.scala:651)
    at org.apache.spark.rdd.PairRDDFunctions.leftOuterJoin(PairRDDFunctions.scala:483)
    at org.apache.spark.rdd.PairRDDFunctions.leftOuterJoin(PairRDDFunctions.scala:555)
...

{fore}之前初始化pointsMapxCoordinatesWithDistance并包含元素。不在foreach循环中leftOuterJoin也可以。有关我的代码的完整版本,请参阅https://github.com/timasjov/spark-learning/blob/master/src/DBSCAN.scala

1 个答案:

答案 0 :(得分:2)

不要在某些RDD运算符的函数中使用RDD。当您想要同时操作多个RDD时,需要使用正确的RDD运算符,例如join