无法在Apache Spark中将RDD保存到HDFS

时间:2017-09-12 14:37:43

标签: apache-spark apache-spark-2.0

尝试将RDD保存到HDFS时出现以下错误

17/09/13 17:06:42 WARN TaskSetManager: Lost task 7340.0 in stage 16.0 (TID 100118, XXXXXX.com, executor 2358): java.io.IOException: Failing write. Tried pipeline recovery 5 times without success.
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:865)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:401)
        Suppressed: java.lang.IllegalArgumentException: Self-suppression not permitted
                at java.lang.Throwable.addSuppressed(Throwable.java:1043)
                at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
                at org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
                at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
                at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$8.apply$mcV$sp(PairRDDFunctions.scala:1218)
                at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1359)
                at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1218)
                at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1197)
                at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
                at org.apache.spark.scheduler.Task.run(Task.scala:99)
                at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                at java.lang.Thread.run(Thread.java:748)
        [CIRCULAR REFERENCE:java.io.IOException: Failing write. Tried pipeline recovery 5 times without success.]

阶段中的最终任务是.saveAsTextFile(),在Spark UI中,我能够看到.saveAsTextFile()之前的其他任务已成功完成。在YARN模式下使用Spark 2.0.0。

编辑: 我已经在Spark: Self-suppression not permitted when writing big file to HDFS看到了答案,我确保答案中提到的问题不是这里的情况。

0 个答案:

没有答案