Mesosphere(dc / os)代理重新连接失败

时间:2017-02-06 16:38:27

标签: mesos mesosphere dcos

我根据本指南设置了小型测试Mesosphere集群https://dcos.io/docs/1.8/administration/installing/custom/cli/ 一切顺利。集群中只有3个节点,一个用于bootstrap,一个用于主节点(10.7.1.12)和一个代理节点(10.7.1.13)。

但是在使用代理节点重新启动计算机后,主节点enter image description here不再可以看到它。

/var/log/mesos/mesos-agent.log中,最后一个输入在重启之前有时间戳。 我正在尝试https://dcos.io/docs/1.8/administration/installing/custom/troubleshooting/的所有步骤,但没有任何改变。

以下是代理断开连接后的主日志(sudo journalctl -u dcos-mesos-master

lut 06 15:48:14 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:48:14.556001  2671 master.cpp:1245] Agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13) disconnected
lut 06 15:48:14 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:48:14.556089  2671 master.cpp:2784] Disconnecting agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:48:14 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:48:14.556170  2671 master.cpp:2803] Deactivating agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.926198  2670 master.cpp:5334] Shutting down agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13) with message 'health check timed out'
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.926230  2670 master.cpp:6617] Removing agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13): health check timed out
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.926507  2670 master.cpp:6910] Removing task 93f4b075-1338-4a84-afd6-6932cfe44c30 with resources mem(arangodb31, arangodb3):2048; cpus(arangodb31, arangodb3):0.25; disk(arangodb31, arangodb3)[AGENCY_991972e5-2d83-4710-ba3c-de8cf02303ab:myPersistentVolume]:2048; ports(arangodb31, arangodb3):[1026-1026] of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 on agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.926695  2670 master.cpp:6910] Removing task 644b59eb-fb20-43fd-a7c1-b1d9406cbfcb with resources mem(arangodb3, arangodb3):2048; cpus(arangodb3, arangodb3):0.25; disk(arangodb3, arangodb3)[AGENCY_0c76702f-ae8b-423c-83a8-1b6e2af8b723:myPersistentVolume]:2048; ports(arangodb3, arangodb3):[1025-1025] of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 on agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 at slave(1)@10.7.1.13:5051 (10.7.1.13)
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928460  2670 master.cpp:6736] Removed agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13): health check timed out
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928472  2670 master.cpp:5197] Sending status update TASK_LOST for task 93f4b075-1338-4a84-afd6-6932cfe44c30 of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 'Slave 10.7.1.13 removed: health check timed out'
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928486  2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 (arangodb3-1) at scheduler-f4f3a3f0-2261-4aaf-9390-81f4b1cc6d20@10.7.1.13:25366
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928611  2670 master.cpp:5197] Sending status update TASK_LOST for task 644b59eb-fb20-43fd-a7c1-b1d9406cbfcb of framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 'Slave 10.7.1.13 removed: health check timed out'
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928638  2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 (arangodb3) at scheduler-7870582e-becd-4747-aeba-0217e91d537e@10.7.1.13:19866
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928747  2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0003 (arangodb3-standalone) at scheduler-180f6695-f3c9-4da6-80e8-d1dc633ec737@10.7.1.13:3583 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928761  2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0003 (arangodb3-standalone) at scheduler-180f6695-f3c9-4da6-80e8-d1dc633ec737@10.7.1.13:3583
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928894  2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 (arangodb3) at scheduler-7870582e-becd-4747-aeba-0217e91d537e@10.7.1.13:19866 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928905  2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0002 (arangodb3) at scheduler-7870582e-becd-4747-aeba-0217e91d537e@10.7.1.13:19866
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928921  2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 (arangodb3-1) at scheduler-f4f3a3f0-2261-4aaf-9390-81f4b1cc6d20@10.7.1.13:25366 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: W0206 15:53:16.928928  2670 master.hpp:2113] Master attempted to send message to disconnected framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0004 (arangodb3-1) at scheduler-f4f3a3f0-2261-4aaf-9390-81f4b1cc6d20@10.7.1.13:25366
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928941  2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0001 (metronome) at scheduler-4eb937a4-9a64-4a47-9245-3858defe691a@10.7.1.12:41077 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering
lut 06 15:53:16 arangodb2.test1.fgtsa.com mesos-master[2661]: I0206 15:53:16.928963  2670 master.cpp:6759] Notifying framework d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-0000 (marathon) at scheduler-02bf4e29-4dd7-4cf8-b14b-4a064b4d082c@10.7.1.12:43643 of lost agent d44b38bf-f35b-4c47-8dfb-d53b543b5e8f-S0 (10.7.1.13) after recovering

其他期刊(`journalctl ...)是空的。

ZooKeeper日志enter image description here

中也存在此错误

如果有任何关于如何进一步调查的建议,我将不胜感激。

编辑:

我设法通过启动dcos-mesos-slave service手动运行代理节点(之前我必须启动dcos-spartandcos-gen-resolvconf服务)。任何想法为什么它没有自动启动?

1 个答案:

答案 0 :(得分:0)

  

有什么想法为什么不能自动启动?

根据rules for using systemd reliably个系统单位彼此不依赖,因此您需要手动启动所有内容。

  
      
  • Requires=Wants=是不允许的。如果依赖的事物失败,则依赖它的事物将永远不会尝试再次启动。
  •   不建议使用
  • Before=After=。它们不是有力的保证,软件需要检查先决条件是否正确并且可以正常工作
  •