mesos-containerizer无法加载libmesos-1.0.1.so

时间:2017-06-23 10:59:29

标签: mesos

我正在尝试在我的DC / OS(v1.8.4)环境中部署tensorflow(带有mesos容器),目标节点有gpu资源并且已经安装了nvidia驱动程序  张量流失败,mesos的stderr中只有一条错误消息:

 mesos-containerizer: error while loading shared libraries: libmesos-1.0.1.so: cannot open shared object file: No such file or directory

在mesos日志中,有以下消息:

Jun 23 10:25:43 gpu-test linker-start-agent.sh[4198]: I0623 10:25:43.069501  4222 linux_launcher.cpp:281] Cloning child process with flags = CLONE_NEWNS | CLONE_NEWPID
Jun 23 10:25:43 gpu-test linker-start-agent.sh[4198]: I0623 10:25:43.072198  4222 systemd.cpp:96] Assigned child process '4977' to 'mesos_executors.slice'
Jun 23 10:25:43 gpu-test linker-start-agent.sh[4198]: I0623 10:25:43.074686  4222 containerizer.cpp:1319] Checkpointing executor's forked pid 4977 to '/var/lib/
mesos/slave/meta/slaves/e97f452e-17de-4c0e-b07d-b9955bbc0844-S3/frameworks/00da5f21-f3ab-4237-b85e-8b767ef53d43-0000/executors/gpu2-cuda.49c58356-57fe-11e7-afed-56b5da32b775/runs/f5ad3b1b-5beb-45d6-b4d4-bd592ae09be8/pids/forked.pid'
Jun 23 10:25:43 gpu-test linker-start-agent.sh[4198]: I0623 10:25:43.077404  4222 containerizer.cpp:1863] Executor for container 'f5ad3b1b-5beb-45d6-b4d4-bd592ae09be8' has exited
Jun 23 10:25:43 gpu-test linker-start-agent.sh[4198]: I0623 10:25:43.077416  4222 containerizer.cpp:1622] Destroying container 'f5ad3b1b-5beb-45d6-b4d4-bd592ae09be8'
Jun 23 10:25:43 gpu-test linker-start-agent.sh[4198]: E0623 10:25:43.178581  4219 slave.cpp:3976] Container 'f5ad3b1b-5beb-45d6-b4d4-bd592ae09be8' for executor'gpu2-cuda.49c58356-57fe-11e7-afed-56b5da32b775' of framework 00da5f21-f3ab-4237-b85e-8b767ef53d43-0000 failed to start: Collect failed: Failed to setup hostname and network files: Failed to enter the mount namespace of pid 4977: Pid 4977 does not exist

日志显示在分配pid进程时失败   但是,如果我尝试使用mesos contianer部署nginx服务,它运行得很好,在部署tensorflow或cuda服务时,mesos-containerizer无法加载libmesos-1.0.1.so,但是对于nginx服务来说很顺利,有所有mesos容器

我不知道哪里出了问题〜

0 个答案:

没有答案