Spark rdd.RDD.take中的空指针异常

时间:2014-05-01 00:01:59

标签: nullpointerexception apache-spark

Spark没有在代码中提供非常丰富的错误消息,但是为了将来参考,这个问题适用于任何获得Null指针异常的人:

java.lang.NullPointerException
    at org.apache.spark.rdd.RDD.take(RDD.scala:850)
    at org.apache.spark.rdd.RDD.first(RDD.scala:862)
    at modelBuilding$$anonfun$3.apply(modelBuilding.scala:46)
    at modelBuilding$$anonfun$3.apply(modelBuilding.scala:46)
    at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
    at scala.collection.Iterator$$anon$20.hasNext(Iterator.scala:634)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$4.apply(RDD.scala:608)
    at org.apache.spark.rdd.RDD$$anonfun$4.apply(RDD.scala:608)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
    at org.apache.spark.scheduler.Task.run(Task.scala:53)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

1 个答案:

答案 0 :(得分:3)

值得庆幸的是,这里讨论的问题有一个变种:

http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-NullPointerException-met-when-computing-new-RDD-or-use-count-td2766.html

出现这种情况的一种方法是在地图调用的闭包中无法引用RDD。

例如, 如果我的原始代码是

 val shiftRDD = qtRdd.filter { _.qtExtEvents.qt.date.getTime() != qtRdd.first().qtExtEvents.qt.date.getTime() }

你必须重构对RDD的引用:

val firstVal = qtRdd.first().qtExtEvents.qt.date.getTime()
val shiftOneqtRdd = qtRdd.filter { _.qtExtEvents.qt.date.getTime() != first }