Spark Job错误:YarnAllocator:退出状态:-100。诊断:在* lost *节点

时间:2015-12-05 06:37:54

标签: amazon-web-services apache-spark yarn emr

我在AWS-EMR 4.1,Spark 1.5上运行了一份工作,其中包含以下内容:

spark-submit --deploy-mode cluster --master yarn-cluster --driver-memory 200g --driver-cores 30 --executor-memory 70g --executor-cores 8 --num-executors 90 --conf spark.storage.memoryFraction=0.45 --conf spark.shuffle.memoryFraction=0.75 --conf spark.task.maxFailures=1 --conf spark.network.timeout=1800s

然后我收到了以下错误。我在哪里可以找到什么是“退出状态:-100”?我怎么能解决这个问题呢?谢谢!

15/12/05 05:54:24 INFO TaskSetManager: Finished task 176.0 in stage 957.0 (TID 128408) in 130885 ms on ip-10-155-195-239.ec2.internal (106/800)
15/12/05 05:54:24 INFO YarnAllocator: Completed container container_1449241952863_0004_01_000026 (state: COMPLETE, exit status: -100)
15/12/05 05:54:24 INFO YarnAllocator: Container marked as failed: container_1449241952863_0004_01_000026. Exit status: -100. Diagnostics: Container released on a *lost* node
15/12/05 05:54:24 INFO YarnAllocator: Completed container container_1449241952863_0004_01_000055 (state: COMPLETE, exit status: -100)
15/12/05 05:54:24 INFO YarnAllocator: Container marked as failed: container_1449241952863_0004_01_000055. Exit status: -100. Diagnostics: Container released on a *lost* node
15/12/05 05:54:24 ERROR YarnClusterScheduler: Lost executor 24 on ip-10-147-11-212.ec2.internal: Yarn deallocated the executor 24 (container container_1449241952863_0004_01_000026)
15/12/05 05:54:24 INFO TaskSetManager: Re-queueing tasks for 24 from TaskSet 957.0
15/12/05 05:54:24 WARN TaskSetManager: Lost task 382.0 in stage 957.0 (TID 128614, ip-10-147-11-212.ec2.internal): ExecutorLostFailure (executor 24 lost)
15/12/05 05:54:24 ERROR TaskSetManager: Task 382 in stage 957.0 failed 1 times; aborting job
15/12/05 05:54:24 WARN TaskSetManager: Lost task 208.0 in stage 957.0 (TID 128440, ip-10-147-11-212.ec2.internal): ExecutorLostFailure (executor 24 lost)

0 个答案:

没有答案