给定容器错误状态代码的位置哪里可以找到更明确的错

时间:2017-05-05 11:18:07

标签: docker mesos

我实际上是通过使用Mesos容器的Docker堆栈运行任务。

有时,某些任务失败。

以下是一些相关的TaskStatus消息和原因:

message: Container exited with status 1 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 42 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 137 - reason: REASON_COMMAND_EXECUTOR_FAILED

是否有一个对应表,它将TaskStatus消息中的容器错误状态代码与更明确的错误相关联?

2 个答案:

答案 0 :(得分:4)

命令任务可能由于多种原因而失败,并设置正确的退出代码。例如,Docker 1.10设置退出状态代码(from documentationthis answer):

  

docker run的退出代码提供了有关原因的信息   容器无法运行或为何退出。当docker run退出时   使用非零代码,退出代码遵循chroot标准,请参阅   下面:

     

125 如果错误是使用Docker守护程序本身

$ docker run --foo busybox; echo $?
# flag provided but not defined: --foo   See 'docker run --help'.   
     

126 如果无法调用包含的命令:

$ docker run busybox /etc; echo $?
# docker: Error response from daemon: Container command '/etc' could not be invoked.   
     

127 如果找不到包含的命令

$ docker run busybox foo; echo $?
# docker: Error response from daemon: Container command 'foo' not found or does not exist.   127 Exit code of contained command
     

,否则

$ docker run busybox /bin/sh -c 'exit 3'; echo $?
# 3

可以找到另一个退出代码规则here

| Code  |            Meaning             |         Example         |                                                   Comments                                                   |
|-------|--------------------------------|-------------------------|--------------------------------------------------------------------------------------------------------------|
| 1     | Catchall for general errors    | let "var1 = 1/0"        | Miscellaneous errors, such as "divide by zero" and other impermissible operations                            |
| 2     | Misuse of shell builtins       | empty_function() {}     | Missing keyword or command, or permission problem (and diff return code on a failed binary file comparison). |
| 126   | Command invoked cannot execute | /dev/null               | Permission problem or command is not an executable                                                           |
| 127   | "command not found"            | illegal_command         | Possible problem with $PATH or a typo                                                                        |
| 128   | Invalid argument to exit       | exit 3.14159            | exit takes only integer args in the range 0 - 255 (see first footnote)                                       |
| 128+n | Fatal error signal "n"         | kill -9 $PPID of script | $? returns 137 (128 + 9)                                                                                     |
| 130   | Script terminated by Control-C | Ctl-C                   | Control-C is fatal error signal 2, (130 = 128 + 2, see above)                                                |
| 255*  | Exit status out of range       | exit -1                 | exit takes only integer args in the range 0 - 255                                                            |

根据你的例子:

如果您需要更多信息来解释状态代码,可以在Mesos TaskStatus更新中查看Message字段,例如Mesos将有关OOM的信息放在那里。在Mesos日志中也可以找到相同的信息。要调试为什么命令返回非零代码,您可以检查存储在执行程序沙箱中的文件,尤其是stderr / stdout或命令特定日志。

答案 1 :(得分:1)

猜猜您要在mesos.proto中查看enum Reason(已在下方复制):

  enum Reason {
    // TODO(jieyu): The default value when a caller doesn't check for
    // presence is 0 and so ideally the 0 reason is not a valid one.
    // Since this is not used anywhere, consider removing this reason.
    REASON_COMMAND_EXECUTOR_FAILED = 0;

    REASON_CONTAINER_LAUNCH_FAILED = 21;
    REASON_CONTAINER_LIMITATION = 19;
    REASON_CONTAINER_LIMITATION_DISK = 20;
    REASON_CONTAINER_LIMITATION_MEMORY = 8;
    REASON_CONTAINER_PREEMPTED = 17;
    REASON_CONTAINER_UPDATE_FAILED = 22;
    REASON_EXECUTOR_REGISTRATION_TIMEOUT = 23;
    REASON_EXECUTOR_REREGISTRATION_TIMEOUT = 24;
    REASON_EXECUTOR_TERMINATED = 1;
    REASON_EXECUTOR_UNREGISTERED = 2;
    REASON_FRAMEWORK_REMOVED = 3;
    REASON_GC_ERROR = 4;
    REASON_INVALID_FRAMEWORKID = 5;
    REASON_INVALID_OFFERS = 6;
    REASON_IO_SWITCHBOARD_EXITED = 27;
    REASON_MASTER_DISCONNECTED = 7;
    REASON_RECONCILIATION = 9;
    REASON_RESOURCES_UNKNOWN = 18;
    REASON_SLAVE_DISCONNECTED = 10;
    REASON_SLAVE_REMOVED = 11;
    REASON_SLAVE_RESTARTED = 12;
    REASON_SLAVE_UNKNOWN = 13;
    REASON_TASK_CHECK_STATUS_UPDATED = 28;
    REASON_TASK_GROUP_INVALID = 25;
    REASON_TASK_GROUP_UNAUTHORIZED = 26;
    REASON_TASK_INVALID = 14;
    REASON_TASK_UNAUTHORIZED = 15;
    REASON_TASK_UNKNOWN = 16;
  }