我实际上是通过使用Mesos
容器的Docker
堆栈运行任务。
有时,某些任务失败。
以下是一些相关的TaskStatus
消息和原因:
message: Container exited with status 1 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 42 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 137 - reason: REASON_COMMAND_EXECUTOR_FAILED
是否有一个对应表,它将TaskStatus
消息中的容器错误状态代码与更明确的错误相关联?
答案 0 :(得分:4)
命令任务可能由于多种原因而失败,并设置正确的退出代码。例如,Docker 1.10设置退出状态代码(from documentation和this answer):
docker run的退出代码提供了有关原因的信息 容器无法运行或为何退出。当docker run退出时 使用非零代码,退出代码遵循chroot标准,请参阅 下面:
125 如果错误是使用Docker守护程序本身:
$ docker run --foo busybox; echo $? # flag provided but not defined: --foo See 'docker run --help'.
126 如果无法调用包含的命令:
$ docker run busybox /etc; echo $? # docker: Error response from daemon: Container command '/etc' could not be invoked.
127 如果找不到包含的命令
$ docker run busybox foo; echo $? # docker: Error response from daemon: Container command 'foo' not found or does not exist. 127 Exit code of contained command
,否则强>
$ docker run busybox /bin/sh -c 'exit 3'; echo $? # 3
可以找到另一个退出代码规则here
| Code | Meaning | Example | Comments |
|-------|--------------------------------|-------------------------|--------------------------------------------------------------------------------------------------------------|
| 1 | Catchall for general errors | let "var1 = 1/0" | Miscellaneous errors, such as "divide by zero" and other impermissible operations |
| 2 | Misuse of shell builtins | empty_function() {} | Missing keyword or command, or permission problem (and diff return code on a failed binary file comparison). |
| 126 | Command invoked cannot execute | /dev/null | Permission problem or command is not an executable |
| 127 | "command not found" | illegal_command | Possible problem with $PATH or a typo |
| 128 | Invalid argument to exit | exit 3.14159 | exit takes only integer args in the range 0 - 255 (see first footnote) |
| 128+n | Fatal error signal "n" | kill -9 $PPID of script | $? returns 137 (128 + 9) |
| 130 | Script terminated by Control-C | Ctl-C | Control-C is fatal error signal 2, (130 = 128 + 2, see above) |
| 255* | Exit status out of range | exit -1 | exit takes only integer args in the range 0 - 255 |
根据你的例子:
128 + 9 = 137 (9 coming from SIGKILL)
并且可以将其转码为Out Of Memory错误并杀死。1
。可能是由于配置无效,内部应用程序错误或输入无效。Answer to the Ultimate Question of Life, the Universe, and Everything
如果您需要更多信息来解释状态代码,可以在Mesos TaskStatus更新中查看Message字段,例如Mesos将有关OOM的信息放在那里。在Mesos日志中也可以找到相同的信息。要调试为什么命令返回非零代码,您可以检查存储在执行程序沙箱中的文件,尤其是stderr / stdout或命令特定日志。
答案 1 :(得分:1)
猜猜您要在mesos.proto
中查看enum Reason(已在下方复制):
enum Reason {
// TODO(jieyu): The default value when a caller doesn't check for
// presence is 0 and so ideally the 0 reason is not a valid one.
// Since this is not used anywhere, consider removing this reason.
REASON_COMMAND_EXECUTOR_FAILED = 0;
REASON_CONTAINER_LAUNCH_FAILED = 21;
REASON_CONTAINER_LIMITATION = 19;
REASON_CONTAINER_LIMITATION_DISK = 20;
REASON_CONTAINER_LIMITATION_MEMORY = 8;
REASON_CONTAINER_PREEMPTED = 17;
REASON_CONTAINER_UPDATE_FAILED = 22;
REASON_EXECUTOR_REGISTRATION_TIMEOUT = 23;
REASON_EXECUTOR_REREGISTRATION_TIMEOUT = 24;
REASON_EXECUTOR_TERMINATED = 1;
REASON_EXECUTOR_UNREGISTERED = 2;
REASON_FRAMEWORK_REMOVED = 3;
REASON_GC_ERROR = 4;
REASON_INVALID_FRAMEWORKID = 5;
REASON_INVALID_OFFERS = 6;
REASON_IO_SWITCHBOARD_EXITED = 27;
REASON_MASTER_DISCONNECTED = 7;
REASON_RECONCILIATION = 9;
REASON_RESOURCES_UNKNOWN = 18;
REASON_SLAVE_DISCONNECTED = 10;
REASON_SLAVE_REMOVED = 11;
REASON_SLAVE_RESTARTED = 12;
REASON_SLAVE_UNKNOWN = 13;
REASON_TASK_CHECK_STATUS_UPDATED = 28;
REASON_TASK_GROUP_INVALID = 25;
REASON_TASK_GROUP_UNAUTHORIZED = 26;
REASON_TASK_INVALID = 14;
REASON_TASK_UNAUTHORIZED = 15;
REASON_TASK_UNKNOWN = 16;
}