Spark没有在代码中提供非常丰富的错误消息,但是为了将来参考,这个问题适用于任何获得Null指针异常的人:
java.lang.NullPointerException
at org.apache.spark.rdd.RDD.take(RDD.scala:850)
at org.apache.spark.rdd.RDD.first(RDD.scala:862)
at modelBuilding$$anonfun$3.apply(modelBuilding.scala:46)
at modelBuilding$$anonfun$3.apply(modelBuilding.scala:46)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
at scala.collection.Iterator$$anon$20.hasNext(Iterator.scala:634)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$4.apply(RDD.scala:608)
at org.apache.spark.rdd.RDD$$anonfun$4.apply(RDD.scala:608)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:884)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
at org.apache.spark.scheduler.Task.run(Task.scala:53)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
答案 0 :(得分:3)
值得庆幸的是,这里讨论的问题有一个变种:
出现这种情况的一种方法是在地图调用的闭包中无法引用RDD。
例如, 如果我的原始代码是
val shiftRDD = qtRdd.filter { _.qtExtEvents.qt.date.getTime() != qtRdd.first().qtExtEvents.qt.date.getTime() }
你必须重构对RDD的引用:
val firstVal = qtRdd.first().qtExtEvents.qt.date.getTime()
val shiftOneqtRdd = qtRdd.filter { _.qtExtEvents.qt.date.getTime() != first }