这是错误日志 -
org.apache.spark.SparkException:无法从提前写入读取数据 日志记录 FileBasedWriteAheadLogSegment(S3N:// ***** /检查点/ receivedData / 20 /对数1439298698600-1439298758600,13678,5069) 在org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org $ apache $ spark $ streaming $ rdd $ WriteAheadLogBackedBlockRDD $$ getBlockFromWriteAheadLog $ 1(WriteAheadLogBackedBlockRDD.scala:144) 在org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD $$ anonfun $ compute $ 1.apply(WriteAheadLogBackedBlockRDD.scala:168) 在org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD $$ anonfun $ compute $ 1.apply(WriteAheadLogBackedBlockRDD.scala:168) 在scala.Option.getOrElse(Option.scala:120) at org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:168) 在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 在org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 在org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) 在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 在org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 在org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) 在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) 在org.apache.spark.rdd.RDD.iterator(RDD.scala:244) 在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) 在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 在org.apache.spark.scheduler.Task.run(Task.scala:70) 在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:213) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)引起:java.lang.NullPointerException 在org.apache.spark.streaming.util.FileBasedWriteAheadLog.read(FileBasedWriteAheadLog.scala:106) 在org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org $ apache $ spark $ streaming $ rdd $ WriteAheadLogBackedBlockRDD $$ getBlockFromWriteAheadLog $ 1(WriteAheadLogBackedBlockRDD.scala:141) ......还有22个
注意:如果HDFS用作存储,则从WAL读取工作正常。
任何帮助都非常感激。
答案 0 :(得分:1)
看起来目前WAL不支持s3,Spark团队正在开发无WAL配置。