当Receiver失败并且WAL存储在s3

时间:2015-08-12 15:55:38

标签: spark-streaming wal

这是错误日志 -

  

org.apache.spark.SparkException:无法从提前写入读取数据   日志记录   FileBasedWriteAheadLogSegment(S3N:// ***** /检查点/ receivedData / 20 /对数1439298698600-1439298758600,13678,5069)           在org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org $ apache $ spark $ streaming $ rdd $ WriteAheadLogBackedBlockRDD $$ getBlockFromWriteAheadLog $ 1(WriteAheadLogBackedBlockRDD.scala:144)           在org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD $$ anonfun $ compute $ 1.apply(WriteAheadLogBackedBlockRDD.scala:168)           在org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD $$ anonfun $ compute $ 1.apply(WriteAheadLogBackedBlockRDD.scala:168)           在scala.Option.getOrElse(Option.scala:120)           at org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:168)           在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)           在org.apache.spark.rdd.RDD.iterator(RDD.scala:244)           在org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)           在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)           在org.apache.spark.rdd.RDD.iterator(RDD.scala:244)           在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)           在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)           在org.apache.spark.rdd.RDD.iterator(RDD.scala:244)           在org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)           在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)           在org.apache.spark.rdd.RDD.iterator(RDD.scala:244)           在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)           在org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)           在org.apache.spark.scheduler.Task.run(Task.scala:70)           在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:213)           在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)           at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615)           at java.lang.Thread.run(Thread.java:745)引起:java.lang.NullPointerException           在org.apache.spark.streaming.util.FileBasedWriteAheadLog.read(FileBasedWriteAheadLog.scala:106)           在org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org $ apache $ spark $ streaming $ rdd $ WriteAheadLogBackedBlockRDD $$ getBlockFromWriteAheadLog $ 1(WriteAheadLogBackedBlockRDD.scala:141)           ......还有22个

注意:如果HDFS用作存储,则从WAL读取工作正常。

任何帮助都非常感激。

1 个答案:

答案 0 :(得分:1)

看起来目前WAL不支持s3,Spark团队正在开发无WAL配置。

https://issues.apache.org/jira/browse/SPARK-9215