Spark 2.0状态:在纱线上完成退出状态代码-100

时间:2017-01-18 18:05:45

标签: apache-spark emr

有人能指出我有关-100退出代码的含义的文件吗? EMR集群,YARN上的spark 2.0.0(每个EMR标准火花集群部署)。我看过https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_yarn_container_exec_errors.html会给出一些错误代码,其中-100不是其中之一。此外,作为一个更一般的问题,似乎YARN容器日志和Spark容器日志都没有包含有关导致此类故障的原因的大量信息......从我看到的YARN日志

17/01/18 17:51:58 INFO YarnAllocator: Canceling requests for 1 executor container(s) to have a new desired total 4164 executors. 17/01/18 17:51:58 INFO YarnAllocator: Driver requested a total number of 4163 executor(s). 17/01/18 17:51:58 INFO YarnAllocator: Canceling requests for 1 executor container(s) to have a new desired total 4163 executors. 17/01/18 17:51:58 INFO YarnAllocator: Driver requested a total number of 4162 executor(s). 17/01/18 17:51:58 INFO YarnAllocator: Canceling requests for 1 executor container(s) to have a new desired total 4162 executors. 17/01/18 17:51:59 INFO YarnAllocator: Driver requested a total number of 4161 executor(s). 17/01/18 17:51:59 INFO YarnAllocator: Driver requested a total number of 4160 executor(s). 17/01/18 17:51:59 INFO YarnAllocator: Canceling requests for 2 executor container(s) to have a new desired total 4160 executors. 17/01/18 17:52:00 INFO YarnAllocator: Driver requested a total number of 4159 executor(s). 17/01/18 17:52:00 INFO YarnAllocator: Canceling requests for 1 executor container(s) to have a new desired total 4159 executors. 17/01/18 17:52:00 INFO YarnAllocator: Completed container container_1483555419510_0037_01_000114 on host: ip-172-20-221-152.us-west-2.compute.internal (state: COMPLETE, exit status: -100) 17/01/18 17:52:00 WARN YarnAllocator: Container marked as failed: container_1483555419510_0037_01_000114 on host: ip-172-20-221-152.us-west-2.compute.internal. Exit status: -100. Diagnostics: Container released on a *lost* node 17/01/18 17:52:00 INFO YarnAllocator: Completed container container_1483555419510_0037_01_000107 on host: ip-172-20-221-152.us-west-2.compute.internal (state: COMPLETE, exit status: -100) 17/01/18 17:52:00 WARN YarnAllocator: Container marked as failed: container_1483555419510_0037_01_000107 on host: ip-172-20-221-152.us-west-2.compute.internal. Exit status: -100. Diagnostics: Container released on a *lost* node 17/01/18 17:52:00 INFO YarnAllocator: Will request 2 executor containers, each with 7 cores and 22528 MB memory including 2048 MB overhead 17/01/18 17:52:00 INFO YarnAllocator: Canceled 0 container requests (locality no longer needed) 17/01/18 17:52:00 INFO YarnAllocator: Submitted container request (host: Any, capability: <memory:22528, vCores:7>) 17/01/18 17:52:00 INFO YarnAllocator: Submitted container request (host: Any, capability: <memory:22528, vCores:7>) 17/01/18 17:52:01 INFO YarnAllocator: Driver requested a total number of 4158 executor(s). 17/01/18 17:52:01 INFO YarnAllocator: Canceling requests for 1 executor container(s) to have a new desired total 4158 executors. 17/01/18 17:52:02 INFO YarnAllocator: Driver requested a total number of 4157 executor(s).

和我看到的Spark执行器日志

17/01/18 17:39:39 INFO MemoryStore: MemoryStore cleared 17/01/18 17:39:39 INFO BlockManager: BlockManager stopped 17/01/18 17:39:39 INFO ShutdownHookManager: Shutdown hook called

这两者都没有提供足够的信息?

1 个答案:

答案 0 :(得分:0)

“退出状态:-100。诊断:丢失节点上释放的容器”告诉您该节点已丢失