为什么mesos-slave要求在重启奴隶上杀死任务超过2小时?
我正在云环境中运行一个具有三个主服务器和四个从服务器的mesos集群。
我发现万一,如果我重新启动了一个运行docker任务的slave。重新启动后,该任务将在该从属设备上处于暂存状态超过2小时。在2个小时之后,马拉松可以在其他"奴隶上启动任务。
如果检查日志,我可以看到它坚持"被要求杀死任务"和"忽略杀戮任务"大约2个小时。
有谁知道为什么Mesos需要尝试杀死死亡任务超过2个小时?
重启后记录:
May 11 10:12:18 euca-10-254-234-236 mesos-slave[824]: I0511 10:12:18.199795 964 slave.cpp:1891] Asked to kill task project-hub_project-hub-backend.e764cc0d-173f-11e6-b66e-d00dacb0c46b of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000
May 11 10:12:18 euca-10-254-234-236 mesos-slave[824]: W0511 10:12:18.199831 964 slave.cpp:2018] Ignoring kill task project-hub_project-hub-backend.e764cc0d-173f-11e6-b66e-d00dacb0c46b because the executor 'project-hub_project-hub-backend.e764cc0d-173f-11e6-b66e-d00dacb0c46b' of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000 is terminating/terminated
May 11 10:12:18 euca-10-254-234-236 mesos-slave[824]: I0511 10:12:18.199872 964 slave.cpp:1891] Asked to kill task docker-registry.d1c20255-173f-11e6-b66e-d00dacb0c46b of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000
2小时后记录:
I0511 12:15:48.200348 963 slave.cpp:1891] Asked to kill task project-hub_project-hub-backend.e764cc0d-173f-11e6-b66e-d00dacb0c46b of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000
W0511 12:15:48.200409 963 slave.cpp:2018] Ignoring kill task project-hub_project-hub-backend.e764cc0d-173f-11e6-b66e-d00dacb0c46b because the executor 'project-hub_project-hub-backend.e764cc0d-173f-11e6-b66e-d00dacb0c46b' of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000 is terminating/terminated
I0511 12:15:48.200429 963 slave.cpp:1891] Asked to kill task docker-registry.d1c20255-173f-11e6-b66e-d00dacb0c46b of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000
W0511 12:15:48.200438 963 slave.cpp:2018] Ignoring kill task docker-registry.d1c20255-173f-11e6-b66e-d00dacb0c46b because the executor 'docker-registry.d1c20255-173f-11e6-b66e-d00dacb0c46b' of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000 is terminating/terminated
I0511 12:15:51.485391 964 http.cpp:190] HTTP GET for /slave(1)/state from 10.145.150.124:59955 with User-Agent='Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36'
I0511 12:15:51.509351 965 http.cpp:190] HTTP GET for /slave(1)/state from 10.145.150.124:59955 with User-Agent='Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36'
W0511 12:15:51.656379 960 slave.cpp:4979] Failed to get resource statistics for executor 'project-hub_project-hub-backend.e764cc0d-173f-11e6-b66e-d00dacb0c46b' of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000: Unknown container: b2f5b385-444b-4174-9a1c-8ccd2d3184dc
W0511 12:15:51.656409 960 slave.cpp:4979] Failed to get resource statistics for executor 'docker-registry.d1c20255-173f-11e6-b66e-d00dacb0c46b' of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000: Unknown container: 1ab25a1b-79fe-430b-9751-330586a1fbef
I0511 12:15:51.663321 965 http.cpp:190] HTTP GET for /slave(1)/state from 10.145.150.124:59560 with User-Agent='Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36'
I0511 12:15:51.671294 965 http.cpp:190] HTTP GET for /slave(1)/state from 10.145.150.124:59560 with User-Agent='Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36'
W0511 12:15:52.156903 962 slave.cpp:4979] Failed to get resource statistics for executor 'project-hub_project-hub-backend.e764cc0d-173f-11e6-b66e-d00dacb0c46b' of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000: Unknown container: b2f5b385-444b-4174-9a1c-8ccd2d3184dc
W0511 12:15:52.156941 962 slave.cpp:4979] Failed to get resource statistics for executor 'docker-registry.d1c20255-173f-11e6-b66e-d00dacb0c46b' of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000: Unknown container: 1ab25a1b-79fe-430b-9751-330586a1fbef
E0511 12:15:52.247448 962 slave.cpp:3773] Container '1ab25a1b-79fe-430b-9751-330586a1fbef' for executor 'docker-registry.d1c20255-173f-11e6-b66e-d00dacb0c46b' of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000 failed to start: future discarded
E0511 12:15:52.247612 962 slave.cpp:3773] Container 'b2f5b385-444b-4174-9a1c-8ccd2d3184dc' for executor 'project-hub_project-hub-backend.e764cc0d-173f-11e6-b66e-d00dacb0c46b' of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000 failed to start: future discarded
W0511 12:15:52.247642 962 composing.cpp:541] Container '1ab25a1b-79fe-430b-9751-330586a1fbef' is already destroyed
W0511 12:15:52.247660 962 composing.cpp:541] Container 'b2f5b385-444b-4174-9a1c-8ccd2d3184dc' is already destroyed
E0511 12:15:52.247704 962 slave.cpp:3870] Termination of executor 'docker-registry.d1c20255-173f-11e6-b66e-d00dacb0c46b' of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000 failed: Unknown container: 1ab25a1b-79fe-430b-9751-330586a1fbef
I0511 12:15:52.248374 962 slave.cpp:3002] Handling status update TASK_FAILED (UUID: b399e8ce-832c-4b06-a15f-3c155536b872) for task docker-registry.d1c20255-173f-11e6-b66e-d00dacb0c46b of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000 from @0.0.0.0:0
E0511 12:15:52.248458 962 slave.cpp:3870] Termination of executor 'project-hub_project-hub-backend.e764cc0d-173f-11e6-b66e-d00dacb0c46b' of framework 17cd3756-1d59-4dfc-984d-3fe09f6b5730-0000 failed: Unknown container: b2f5b385-444b-4174-9a1c-8ccd2d3184dc