在yarn-client模式下使用Spark流式空指针异常

时间:2016-06-01 10:05:29

标签: scala apache-spark spark-streaming

任何人都可以告诉为什么下面的代码在纱线客户端模式下抛出Null Pointer Execption,而它在本地模式下运行正常。

val filters = args.takeRight(0)
val sparkConf = new SparkConf().setAppName("TwitterAnalyzer")
val ssc = new StreamingContext(sparkConf, Seconds(2))
val stream = TwitterUtils.createStream(ssc, None, filters)
val training = ssc.textFileStream("/user/hadoop/Training")
val tf = new HashingTF(numFeatures = 140)
val text = stream.filter(x => x.getLang() == "en" ).map( x =>  x.getText )
.filter(tweet => tweet != null).map(tweet => tf.transform(tweet.split(" ")))

在给定代码的最后一行抛出异常

以下是来自纱线日志的错误(程序是用户类)的堆栈跟踪

  

java.lang.NullPointerException at   com.Program $$ anonfun $ 5.apply(Program.scala:40)at   com.Program $$ anonfun $ 5.apply(Program.scala:40)at   scala.collection.Iterator $$ anon $ 11.next(Iterator.scala:328)at   scala.collection.Iterator $$ anon $ 11.next(Iterator.scala:328)at   scala.collection.Iterator $$ anon $ 10.next(Iterator.scala:312)at   scala.collection.Iterator $ class.foreach(Iterator.scala:727)at   scala.collection.AbstractIterator.foreach(Iterator.scala:1157)at   scala.collection.generic.Growable $类$加$加$ EQ(Growable.scala:48)。   在   scala.collection.mutable.ArrayBuffer $加$加$ EQ(ArrayBuffer.scala:103)。   在   scala.collection.mutable.ArrayBuffer $加$加$ EQ(ArrayBuffer.scala:47)。   在   scala.collection.TraversableOnce $ class.to(TraversableOnce.scala:273)   在scala.collection.AbstractIterator.to(Iterator.scala:1157)at   scala.collection.TraversableOnce $ class.toBuffer(TraversableOnce.scala:265)   在scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)at   scala.collection.TraversableOnce $ class.toArray(TraversableOnce.scala:252)   在scala.collection.AbstractIterator.toArray(Iterator.scala:1157)at   org.apache.spark.rdd.RDD $$ anonfun $ $取1 $$ anonfun $ 28.apply(RDD.scala:1328)   在   org.apache.spark.rdd.RDD $$ anonfun $ $取1 $$ anonfun $ 28.apply(RDD.scala:1328)   在   org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:1858)   在   org.apache.spark.SparkContext $$ anonfun $ runJob $ 5.apply(SparkContext.scala:1858)   在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)   在org.apache.spark.scheduler.Task.run(Task.scala:89)at   org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:214)   在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)   在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:617)   在java.lang.Thread.run(Thread.java:745)

0 个答案:

没有答案