我正在读取实木复合地板文件并将已处理的结果保存到文本文件中。我的一些火花任务因以下错误而失败
19/03/07 19:46:41 WARN TaskSetManager: Lost task 13345.0 in stage 2.0 (TID 13520, yarn-358563238-2-376522073.prod.qarth-yarn-prod-cdc.cdqarth.cdcprod04.prod.walmart.com, executor 128): org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:151)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:79)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [5 minutes]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at com.walmartlabs.qarth.quality.bfd.spark.concurrency.ThreadedConcurrentContext$$anonfun$awaitSliding$1.apply(ThreadedConcurrentContext.scala:20)
at com.walmartlabs.qarth.quality.bfd.spark.concurrency.ThreadedConcurrentContext$$anonfun$awaitSliding$1.apply(ThreadedConcurrentContext.scala:20)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$JoinIterator.next(Iterator.scala:230)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:124)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:123)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:135)
... 8 more
这些失败的任务在重试时会成功,因为我将spark.task.maxFailures设置为20。但是其中一个任务失败了20次,并杀死了我的spark工作。我想了解是哪种配置导致了这些“期货超时”异常。
我应该设置spark.sql.broadcastTimeout吗?我还没有尝试过,但这没有任何意义,因为我的情况下没有广播连接。
sun.java.command org.apache.spark.deploy.SparkSubmit --master yarn --conf spark.shuffle.external.server=true --conf spark.dynamicAllocation.minExecutors=0 --conf spark.dynamicAllocation.initialExecutors=0 --conf spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j-spark-executor.properties --conf spark.dynamicAllocation.maxExecutors=200 --conf spark.task.maxFailures=50 --conf spark.dynamicAllocation.enabled=true --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j-spark-driver.properties --class my.org.Main ./myJarFile.jar -configLocation ./myConf.conf