我根据本指南设置了小型测试Mesosphere集群https://dcos.io/docs/1.8/administration/installing/custom/cli/ 一切顺利。集群中只有3个节点,一个用于bootstrap,一个用于主节点(10.7.1.12)和一个代理节点(10.7.1.13)。
在/var/log/mesos/mesos-agent.log
中,最后一个输入在重启之前有时间戳。
我正在尝试https://dcos.io/docs/1.8/administration/installing/custom/troubleshooting/的所有步骤,但没有任何改变。
以下是代理断开连接后的主日志(sudo journalctl -u dcos-mesos-master
)
lut 06 15:48:14 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:48:14.556001 2671 master.cpp:1245] Agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13) disconnected
lut 06 15:48:14 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:48:14.556089 2671 master.cpp:2784] Disconnecting agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:48:14 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:48:14.556170 2671 master.cpp:2803] Deactivating agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.926198 2670 master.cpp:5334] Shutting down agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13) with message 'health check timed out'
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.926230 2670 master.cpp:6617] Removing agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13): health check timed out
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.926507 2670 master.cpp:6910] Removing task 93f4b075-1338-4a84-afd6-6932cfe44c30 with resources mem(arangodb31, arangodb3):2048; cpus(arangodb31, arangodb3):0.25; disk(arangodb31, arangodb3)[AGENCY_991972e5-2d83-4710-ba3c-de8cf02303ab:myPersistentVolume]:2048; ports(arangodb31, arangodb3):[1026-1026] of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 on agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.926695 2670 master.cpp:6910] Removing task 644b59eb-fb20-43fd-a7c1-b1d9406cbfcb with resources mem(arangodb3, arangodb3):2048; cpus(arangodb3, arangodb3):0.25; disk(arangodb3, arangodb3)[AGENCY_0c76702f-ae8b-423c-83a8-1b6e2af8b723:myPersistentVolume]:2048; ports(arangodb3, arangodb3):[1025-1025] of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 on agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928460 2670 master.cpp:6736] Removed agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13): health check timed out
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928472 2670 master.cpp:5197] Sending status update TASK_LOST for task 93f4b075-1338-4a84-afd6-6932cfe44c30 of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 'Slave 10.7.1.13 removed: health check timed out'
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928486 2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 (arangodb3-1) at scheduler-f4f3a3f0-2261-4aaf-9390-81f4b1cc6d20@10.7.1.13:25366
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928611 2670 master.cpp:5197] Sending status update TASK_LOST for task 644b59eb-fb20-43fd-a7c1-b1d9406cbfcb of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 'Slave 10.7.1.13 removed: health check timed out'
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928638 2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 (arangodb3) at scheduler-7870582e-becd-4747-aeba-0217e91d537e@10.7.1.13:19866
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928747 2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0003 (arangodb3-standalone) at scheduler-180f6695-f3c9-4da6-80e8-d1dc633ec737@10.7.1.13:3583 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928761 2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0003 (arangodb3-standalone) at scheduler-180f6695-f3c9-4da6-80e8-d1dc633ec737@10.7.1.13:3583
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928894 2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 (arangodb3) at scheduler-7870582e-becd-4747-aeba-0217e91d537e@10.7.1.13:19866 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928905 2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 (arangodb3) at scheduler-7870582e-becd-4747-aeba-0217e91d537e@10.7.1.13:19866
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928921 2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 (arangodb3-1) at scheduler-f4f3a3f0-2261-4aaf-9390-81f4b1cc6d20@10.7.1.13:25366 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928928 2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 (arangodb3-1) at scheduler-f4f3a3f0-2261-4aaf-9390-81f4b1cc6d20@10.7.1.13:25366
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928941 2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0001 (metronome) at scheduler-4eb937a4-9a64-4a47-9245-3858defe691a@10.7.1.12:41077 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928963 2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0000 (marathon) at scheduler-02bf4e29-4dd7-4cf8-b14b-4a064b4d082c@10.7.1.12:43643 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
其他期刊(`journalctl ...)是空的。
中也存在此错误如果有任何关于如何进一步调查的建议,我将不胜感激。
编辑:
我设法通过启动dcos-mesos-slave service
手动运行代理节点(之前我必须启动dcos-spartan
和dcos-gen-resolvconf
服务)。任何想法为什么它没有自动启动?
答案 0 :(得分:0)
有什么想法为什么不能自动启动?
根据rules for using systemd reliably个系统单位彼此不依赖,因此您需要手动启动所有内容。
- 不建议使用
Requires=
,Wants=
是不允许的。如果依赖的事物失败,则依赖它的事物将永远不会尝试再次启动。Before=
,After=
。它们不是有力的保证,软件需要检查先决条件是否正确并且可以正常工作