这是我的第一个问题,我希望能够正确地做到这一切。
我在不同的主机上有3个码头工具,包括zookeeper,mesos和chronos。 Mesos slave正确订阅了master。 Chronos任务与每个主机同步。
问题是:chronos框架正在连接和断开连接:
0915 12:12:11.132375 49 master.cpp:2231] Received SUBSCRIBE call for framework 'chronos-2.4.0' at scheduler-e6ebc7bc-8edb-45e9-ad68-3fa36566b55b@10.xxx.xxx.xxx:61740
I0915 12:12:11.132647 49 master.cpp:2302] Subscribing framework chronos-2.4.0 with checkpointing enabled and capabilities [ ]
I0915 12:12:11.133229 49 master.cpp:2312] Framework 71c69a28-ef16-4ed1-b869-04df66f84b5d-0000 (chronos-2.4.0) at scheduler-e6ebc7bc-8edb-45e9-ad68-3fa36566b55b@10.xxx.xxx.xxx:61740 already subscribed, resending acknowledgement
W0915 12:12:11.133322 49 master.hpp:1764] Master attempted to send message to disconnected framework 71c69a28-ef16-4ed1-b869-04df66f84b5d-0000 (chronos-2.4.0) at scheduler-e6ebc7bc-8edb-45e9-ad68-3fa36566b55b@10.xxx.xxx.xxx:61740
E0915 12:12:11.133745 55 process.cpp:1958] Failed to shutdown socket with fd 41: Transport endpoint is not connected
I0915 12:12:25.648849 52 master.cpp:2231] Received SUBSCRIBE call for framework 'chronos-2.4.0' at scheduler-e6ebc7bc-8edb-45e9-ad68-3fa36566b55b@10.xxx.xxx.xxx:61740
I0915 12:12:25.649029 52 master.cpp:2302] Subscribing framework chronos-2.4.0 with checkpointing enabled and capabilities [ ]
I0915 12:12:25.649060 52 master.cpp:2312] Framework 71c69a28-ef16-4ed1-b869-04df66f84b5d-0000 (chronos-2.4.0) at scheduler-e6ebc7bc-8edb-45e9-ad68-3fa36566b55b@10.xxx.xxx.xxx:61740 already subscribed, resending acknowledgement
W0915 12:12:25.649116 52 master.hpp:1764] Master attempted to send message to disconnected framework 71c69a28-ef16-4ed1-b869-04df66f84b5d-0000 (chronos-2.4.0) at scheduler-e6ebc7bc-8edb-45e9-ad68-3fa36566b55b@10.xxx.xxx.xxx:61740
E0915 12:12:25.649433 55 process.cpp:1958] Failed to shutdown socket with fd 41: Transport endpoint is not connected
I0915 12:13:15.146510 50 master.cpp:2231] Received SUBSCRIBE call for framework 'chronos-2.4.0' at scheduler-e6ebc7bc-8edb-45e9-ad68-3fa36566b55b@10.xxx.xxx.xxx:61740
I0915 12:13:15.146759 50 master.cpp:2302] Subscribing framework chronos-2.4.0 with checkpointing enabled and capabilities [ ]
I0915 12:13:15.146848 50 master.cpp:2312] Framework 71c69a28-ef16-4ed1-b869-04df66f84b5d-0000 (chronos-2.4.0) at scheduler-e6ebc7bc-8edb-45e9-ad68-3fa36566b55b@10.xxx.xxx.xxx:61740 already subscribed, resending acknowledgement
W0915 12:13:15.146939 50 master.hpp:1764] Master attempted to send message to disconnected framework 71c69a28-ef16-4ed1-b869-04df66f84b5d-0000 (chronos-2.4.0) at scheduler-e6ebc7bc-8edb-45e9-ad68-3fa36566b55b@10.xxx.xxx.xxx:61740
E0915 12:13:15.147408 55 process.cpp:1958] Failed to shutdown socket with fd 41: Transport endpoint is not connected
I0915 12:14:04.957185 51 master.cpp:2231] Received SUBSCRIBE call for framework 'chronos-2.4.0' at scheduler-e6ebc7bc-8edb-45e9-ad68-3fa36566b55b@10.xxx.xxx.xxx:61740
I0915 12:14:04.957341 51 master.cpp:2302] Subscribing framework chronos-2.4.0 with checkpointing enabled and capabilities [ ]
I0915 12:14:04.957363 51 master.cpp:2312] Framework 71c69a28-ef16-4ed1-b869-04df66f84b5d-0000 (chronos-2.4.0) at scheduler-e6ebc7bc-8edb-45e9-ad68-3fa36566b55b@10.xxx.xxx.xxx:61740 already subscribed, resending acknowledgement
W0915 12:14:04.957392 51 master.hpp:1764] Master attempted to send message to disconnected framework 71c69a28-ef16-4ed1-b869-04df66f84b5d-0000 (chronos-2.4.0) at scheduler-e6ebc7bc-8edb-45e9-ad68-3fa36566b55b@10.xxx.xxx.xxx:61740
E0915 12:14:04.957844 55 process.cpp:1958] Failed to shutdown socket with fd 41: Transport endpoint is not connected
在这种情况下,mesos-master和chronos框架在同一个docker中,但我怀疑无法连接到Chronos端口61740(这是一个短暂的端口)
netstat capture:
tcpdump capture:
root@HOSTNAME:/# tcpdump -i eth0 port 61740 -v
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
12:30:41.013731 IP (tos 0x0, ttl 64, id 12013, offset 0, flags [DF], proto TCP (6), length 60)
172.xxx.xxx.xxx.29468 > HOSTNAME.61740: Flags [S], cksum 0xb989 (incorrect -> 0xa894), seq 1155265525, win 14600, options [mss 1460,sackOK,TS val 852942104 ecr 0,nop,wscale 6], len gth 0
12:30:41.013780 IP (tos 0x0, ttl 64, id 49727, offset 0, flags [DF], proto TCP (6), length 40)
HOSTNAME.61740 > 172.xxx.xxx.xxx.29468: Flags [R.], cksum 0x595a (correct), seq 0, ack 1155265526, win 0, length 0
12:31:18.129849 IP (tos 0x0, ttl 64, id 64040, offset 0, flags [DF], proto TCP (6), length 60)
172.xxx.xxx.xxx.30564 > HOSTNAME.61740: Flags [S], cksum 0xb989 (incorrect -> 0x97fb), seq 535270461, win 14600, options [mss 1460,sackOK,TS val 852979221 ecr 0,nop,wscale 6], leng th 0
12:31:18.129892 IP (tos 0x0, ttl 64, id 6441, offset 0, flags [DF], proto TCP (6), length 40)
HOSTNAME.61740 > 172.xxx.xxx.xxx.30564: Flags [R.], cksum 0xd9be (correct), seq 0, ack 535270462, win 0, length 0
12:31:36.451417 IP (tos 0x0, ttl 64, id 21303, offset 0, flags [DF], proto TCP (6), length 60)
172.xxx.xxx.xxx.31103 > HOSTNAME.61740: Flags [S], cksum 0xb989 (incorrect -> 0x10c7), seq 186377873, win 14600, options [mss 1460,sackOK,TS val 852997542 ecr 0,nop,wscale 6], leng th 0
12:31:36.451470 IP (tos 0x0, ttl 64, id 13169, offset 0, flags [DF], proto TCP (6), length 40)
HOSTNAME.61740 > 172.xxx.xxx.xxx.31103: Flags [R.], cksum 0x9a1b (correct), seq 0, ack 186377874, win 0, length 0
12:31:41.619076 IP (tos 0x0, ttl 64, id 41997, offset 0, flags [DF], proto TCP (6), length 60)
172.xxx.xxx.xxx.31252 > HOSTNAME.61740: Flags [S], cksum 0xb989 (incorrect -> 0xfe18), seq 2176478683, win 14600, options [mss 1460,sackOK,TS val 853002710 ecr 0,nop,wscale 6], length 0
12:31:41.619119 IP (tos 0x0, ttl 64, id 13179, offset 0, flags [DF], proto TCP (6), length 40)
HOSTNAME.61740 > 172.xxx.xxx.xxx.31252: Flags [R.], cksum 0x9b9d (correct), seq 0, ack 2176478684, win 0, length 0
IP 172.xxx.xxx.xxx是容器IP,但我实际上是这样运行mesos-master:
mesos-master --log_dir=/var/log/mesos/master/ --work_dir=/var/log/mesos/work/ --quorum=2 --cluster=XXXX --zk=file:///etc/mesos/zk --advertise_ip=10.XXX.XXX.XXX --hostname=HOSTNAME
任何想法或建议都将受到赞赏。
感谢。
答案 0 :(得分:0)
在tcpdump catpure中,我们可以看到错误的校验和。它似乎是内核版本(3.10)中的一个错误。这修复了3.14+,但我无法检查,因为我们无法在这个环境中更新。