我遇到了在hadoop / yarn群集上运行spark jon的问题,它在本地模式下运行正常但由于此空指针异常而在群集模式下失败 我在本地和集群中都使用spark 1.6.2和scala 2.10.6,该应用程序是来自kakfa的流应用程序流数据,这里是我获取空指针的代码,我&#39 ;我能够获得一些批次的数据但是对于一些我得到空指针作为空指针的堆积工作失败这里是它失败的代码片段 DevMain.scala
Line 1 val lines: DStream[String,Array[Byte]] = myConsumer.createDefaultStream()
Line 2 val keyDeLines = lines.map(lme.aParser);
这是createDefaultStream()
def createDefaultStream(): DStream[(String,Array[Byte])] = {
val consumerConfProps = List("zookeeper.connect","group.id","zookeeper.connection.timeout.ms")
val kafkaConf = Utils.getSubProps(props,consumerConfProps)
val topicArray = props.getProperty("topics").split(",")
val topicMap = {
topicArray.map((_, props.getProperty("numthreads").toInt)).toMap
}
KafkaUtils.createStream[String, Array[Byte], StringDecoder, DefaultDecoder](ssc,
kafkaConf,
topicMap,
StorageLevel.MEMORY_ONLY_SER
)
这是lme.parser
def aParser(x: (String,Array[Byte])): Option[Map[String,Any]] = {
logInfo("Entered lme: ")
val decodeTry = Injection.invert(x._2)
decodeTry match {
case Failure(e) => {
logInfo(s"Could not decode binary data: " + e.getStackTrace)
None
}
case Success(eventPojo) => {
val bs: String = eventPojo.toString
logInfo("json: " + bs)
}
}
代码永远不会进入' lme.aParser'函数在空指针的情况下,我已将日志记录放在lme.parser的第1行
这是stacktrace
java.lang.NullPointerException
at DevMain$$anonfun$5.apply(DevMain.scala:2)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1631)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我在集群上运行火花的新手可以请一些人指出我正确的方向
注意:我知道在地图中它试图迭代dstream行的元素..但是当它空了它失败时,但是从我在dsteam空批次上做的读数不应该导致失败请纠正我,如果我错了....已经完成了我的分享,有些人指出它没有从java迭代器转换为火花代码中的scala迭代器,其他人指出这可能是spark的序列化代码中的一个错误..不确定哪个方向前往
答案 0 :(得分:0)