我想在马拉松比赛中运行Docker容器。 (Mesos和Marathon都在docker上运行。)
当我通过docker run命令直接运行映像时,一切正常且正常工作。
但是当通过马拉松运行图像时,它在一分钟(60秒)后被杀死,然后马拉松重建容器,在一分钟后依旧被杀死,依此类推。
最奇怪的是我在mesos-slave中找到了以下日志
无法更新容器的资源 执行者的f891ffca-39c3-4b70-adae-2520864c42b2 'jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86'运行任务 关于状态更新的jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 终端任务,销毁容器:无法确定cgroup用于 'cpu'子系统:无法读取/ proc / 27321 / cgroup:没有此类文件 或目录
我已经在互联网上研究了这个问题,但是大多数问题都是通过增加内存来解决的,这种方式对我来说是行不通的,即使我把所有的内存都给了我。
登录mesos-slave
I0703 06:05:25.992172 18 slave.cpp:5283] Handling status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 from executor(1)@mesos1:35786
E0703 06:05:26.070716 14 slave.cpp:5614] Failed to update resources for container f891ffca-39c3-4b70-adae-2520864c42b2 of executor 'jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86' running task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/27321/cgroup: No such file or directory
I0703 06:05:26.070905 19 docker.cpp:2331] Destroying container f891ffca-39c3-4b70-adae-2520864c42b2 in RUNNING state
I0703 06:05:26.070940 19 docker.cpp:2336] Sending SIGTERM to executor with pid: 792
I0703 06:05:26.070894 12 task_status_update_manager.cpp:328] Received task status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.070981 12 task_status_update_manager.cpp:842] Checkpointing UPDATE for task status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.071130 12 slave.cpp:5775] Forwarding the update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 to master@mesos1:5060
I0703 06:05:26.071277 12 slave.cpp:5684] Sending acknowledgement for status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 to executor(1)@mesos1:35786
I0703 06:05:26.073609 19 docker.cpp:2381] Running docker stop on container f891ffca-39c3-4b70-adae-2520864c42b2
I0703 06:05:26.076584 17 slave.cpp:5907] Got exited event for executor(1)@mesos1:35786
I0703 06:05:26.082994 12 task_status_update_manager.cpp:401] Received task status update acknowledgement (UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.083410 12 task_status_update_manager.cpp:842] Checkpointing ACK for task status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.171222 14 docker.cpp:2560] Executor for container f891ffca-39c3-4b70-adae-2520864c42b2 has exited
I0703 06:05:26.172829 12 slave.cpp:6305] Executor 'jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86' of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 terminated with signal Terminated
I0703 06:05:26.172868 12 slave.cpp:6403] Cleaning up executor 'jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86' of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 at executor(1)@mesos1:35786
I0703 06:05:26.173218 18 gc.cpp:90] Scheduling '/var/tmp/mesos/slaves/82f5f7aa-772c-48b8-b9e9-5675fe0b7fa9-S0/frameworks/6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003/executors/jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86/runs/f891ffca-39c3-4b70-adae-2520864c42b2' for gc 6.99999799573037days in the future
登录mesos-master
I0703 06:05:26.071590 15 master.cpp:7962] Status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 from agent 82f5f7aa-772c-48b8-b9e9-5675fe0b7fa9-S0 at slave(1)@mesos1:5051 (mesos1)
I0703 06:05:26.071923 15 master.cpp:8018] Forwarding status update TASK_FAILED (Status UUID: 4f55a18e-37ea-48fc-8f2d-2228f95a7097) for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.072099 15 master.cpp:10278] Updating the state of task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 (latest state: TASK_FAILED, status update state: TASK_FAILED)
I0703 06:05:26.080749 15 master.cpp:5623] Processing REVIVE call for framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 (marathon) at scheduler-13cb0ef0-e5fb-40ac-aa5e-d8d7284e409b@mesos1:44408
I0703 06:05:26.080828 15 hierarchical.cpp:1339] Revived offers for roles { * } of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003
I0703 06:05:26.081554 13 master.cpp:8870] Sending 1 offers to framework 2b59b774-1033-4f63-b403-fce174f8155b-0004 (Spark Cluster) at scheduler-48bf92fe-e69d-4af8-8d7e-e22cc5177d02@mesos3:46594
I0703 06:05:26.081902 13 master.cpp:8870] Sending 2 offers to framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 (marathon) at scheduler-13cb0ef0-e5fb-40ac-aa5e-d8d7284e409b@mesos1:44408
I0703 06:05:26.082159 12 http.cpp:1185] HTTP GET for /master/state?jsonp=angular.callbacks._3w from 10.1.21.12:65271 with User-Agent='Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
I0703 06:05:26.082294 12 master.cpp:5877] Processing ACKNOWLEDGE call 4f55a18e-37ea-48fc-8f2d-2228f95a7097 for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 (marathon) at scheduler-13cb0ef0-e5fb-40ac-aa5e-d8d7284e409b@mesos1:44408 on agent 82f5f7aa-772c-48b8-b9e9-5675fe0b7fa9-S0
I0703 06:05:26.082341 12 master.cpp:10382] Removing task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86 with resources cpus(allocated: *):0.5; mem(allocated: *):128; disk(allocated: *):128; ports(allocated: *):[50690-50690] of framework 6f16c868-e43d-4d49-aa57-2dee2bbd782d-0003 on agent 82f5f7aa-772c-48b8-b9e9-5675fe0b7fa9-S0 at slave(1)@mesos1:5051 (mesos1)
登录马拉松
[2018-07-03 06:05:26,073] INFO Received status update for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86: TASK_FAILED (Failed to get exit status of container) (mesosphere.marathon.MarathonScheduler:Thread-97)
[2018-07-03 06:05:26,075] INFO all tasks of instance [jping.marathon-eff7501c-7e86-11e8-aea3-22ffdeeedb86] are terminal, requesting to expunge (mesosphere.marathon.core.instance.update.InstanceUpdater$:marathon-akka.actor.default-dispatcher-16)
[2018-07-03 06:05:26,079] INFO Removed app [/jping] from tracker (mesosphere.marathon.core.task.tracker.InstanceTracker$InstancesBySpec:marathon-akka.actor.default-dispatcher-16)
[2018-07-03 06:05:26,080] INFO Increasing delay. Task launch delay for [/jping - 2018-07-03T03:49:41.174Z] is set to 2 seconds 313 milliseconds (mesosphere.marathon.core.launchqueue.impl.RateLimiter$:marathon-akka.actor.default-dispatcher-16)
[2018-07-03 06:05:26,080] INFO receiveInstanceUpdate: instance [jping.marathon-eff7501c-7e86-11e8-aea3-22ffdeeedb86] was deleted (Failed) (mesosphere.marathon.core.launchqueue.impl.TaskLauncherActor:marathon-akka.actor.default-dispatcher-19)
[2018-07-03 06:05:26,080] INFO Received reviveOffers notification: ReviveOffers$ (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-15)
[2018-07-03 06:05:26,080] INFO => revive offers NOW, canceling any scheduled revives (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-15)
[2018-07-03 06:05:26,080] INFO initiating a scale check for runSpec [/jping] due to [instance [jping.marathon-eff7501c-7e86-11e8-aea3-22ffdeeedb86]] Failed (mesosphere.marathon.core.task.update.impl.steps.ScaleAppUpdateStepImpl:marathon-akka.actor.default-dispatcher-16)
[2018-07-03 06:05:26,080] INFO 2 further revives still needed. Repeating reviveOffers according to --revive_offers_repetitions 3 (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-15)
[2018-07-03 06:05:26,080] INFO => Schedule next revive at 2018-07-03T06:05:31.080Z in 5000 milliseconds, adhering to --min_revive_offers_interval 5000 (ms) (mesosphere.marathon.core.flow.impl.ReviveOffersActor:marathon-akka.actor.default-dispatcher-15)
[2018-07-03 06:05:26,080] INFO Acknowledge status update for task jping.eff7501c-7e86-11e8-aea3-22ffdeeedb86: TASK_FAILED (Failed to get exit status of container) (mesosphere.marathon.core.task.update.impl.TaskStatusUpdateProcessorImpl:scala-execution-context-global-150)
[2018-07-03 06:05:26,081] INFO Need to scale /jping from 0 up to 1 instances (mesosphere.marathon.SchedulerActions:scheduler-actions-thread-0)
[2018-07-03 06:05:26,081] INFO Queueing 1 new instances for /jping to the already 0 queued ones (mesosphere.marathon.SchedulerActions:scheduler-actions-thread-0)
[2018-07-03 06:05:26,081] INFO add 1 instances to 0 instances to launch
我可以参考类似的问题吗?
马拉松版本:1.6.352
Mesos版本:1.5.1
我终于找到问题了。
马拉松容器任务被杀死的原因是码头工人不知道马拉松产生的PID。所以Docker杀死了马拉松所创建的pid。这导致了我的问题。
由于我在mesos-slave上添加了带有--pid = host的docker run命令,此问题已解决。