使用长期命令的minion没有回应

时间:2016-10-18 17:31:42

标签: python salt-stack tcpdump

我有一个环境,其中salt-master< - > salt-minion通信显然已经建立:4505和4506 TCP端口是开放的,密钥已被接受,测试模块的一些功能正常工作:

root@minion01 # telnet master01 4505
Trying 100.134.0.200...
Connected to master01.
Escape character is '^]'.

[user@master01 ~]$ salt-key -L
Accepted Keys:
minion01

[user@master01 ~]$ salt 'minion01' test.ping
minion01:
    True

[user@master01 ~]$ salt 'minion01' test.version
minion01:
    2015.8.8

但是,当我尝试执行某些二进制文件时,我没有得到任何响应:

[user@master01 ~]$ salt 'minion01' cmd.script 'salt://bin/test01' args='bla'
minion01:
    Minion did not return. [No response]

我也从测试模块中找到了有趣的功能来调试这个问题:test.rand_sleep

如果我使用debug flag运行该函数:

[user@master01 ~]$ salt 'minion01' test.rand_sleep -l debug
[DEBUG   ] Reading configuration from /salt/etc/master
[DEBUG   ] Missing configuration file: /home/user/.saltrc
[DEBUG   ] Configuration file path: /salt/etc/master
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Reading configuration from /salt/etc/master
[DEBUG   ] Missing configuration file: /home/user/.saltrc
[DEBUG   ] MasterEvent PUB socket URI: ipc:///salt/salt/cache/.salt-unix/master_event_pub.ipc
[DEBUG   ] MasterEvent PULL socket URI: ipc:///salt/salt/cache/.salt-unix/master_event_pull.ipc
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/salt/pki/master', 'master01_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] LazyLoaded local_cache.get_load
[DEBUG   ] Reading minion list from /salt/salt/cache/jobs/59/be0dc5d330a8a183114e4826349b02/.minions.p
[DEBUG   ] get_iter_returns for jid 20161018122035908234 sent to set(['minion01']) will timeout at 12:20:40.920216
[DEBUG   ] Checking whether jid 20161018122035908234 is still running
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/salt/pki/master', 'master01_master', 'tcp://127.0.0.1:4506', 'clear')
[DEBUG   ] LazyLoaded no_return.output
minion01:
    Minion did not return. [No response]

如果我嗅到流量......我实际上看到数据包以正确的方式进行:

[root@master01 ~]# tcpdump dst 100.134.0.239 -i any and portrange 4505-4506
12:24:57.051480 IP master01.4505 > minion01.49194: Flags [P.], seq 5698:5893, ack 1, win 115, length 195
12:25:02.064182 IP master01.4505 > minion01.49194: Flags [P.], seq 5893:6104, ack 1, win 115, length 211
12:25:04.296926 IP master01.4506 > minion01.49747: Flags [S.], seq 2770281963, ack 425526514, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
12:25:04.297389 IP master01.4506 > minion01.49747: Flags [P.], seq 1:11, ack 1, win 115, length 10
12:25:04.297402 IP master01.4506 > minion01.49747: Flags [.], ack 11, win 115, length 0
12:25:04.297419 IP master01.4506 > minion01.49747: Flags [P.], seq 11:12, ack 11, win 115, length 1
12:25:04.297680 IP master01.4506 > minion01.49747: Flags [P.], seq 12:13, ack 13, win 115, length 1
12:25:04.297704 IP master01.4506 > minion01.49747: Flags [P.], seq 13:15, ack 13, win 115, length 2
12:25:04.302246 IP master01.4506 > minion01.49748: Flags [S.], seq 2644063969, ack 425575524, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
12:25:04.302570 IP master01.4506 > minion01.49748: Flags [P.], seq 1:11, ack 1, win 115, length 10
...

回来......

[root@master01 ~]# tcpdump src 100.134.0.239 -i any and portrange 4505-4506
12:27:09.254011 IP minion01.49194 > master01.4505: Flags [.], ack 819763160, win 49640, length 0
12:27:11.606309 IP minion01.49782 > master01.4506: Flags [S], seq 462752770, win 49640, options [mss 1460,nop,wscale 0,nop,nop,sackOK], length 0
12:27:11.606671 IP minion01.49782 > master01.4506: Flags [.], ack 1832475241, win 49640, length 0
12:27:11.606856 IP minion01.49782 > master01.4506: Flags [P.], seq 0:10, ack 1, win 49640, length 10
12:27:11.607084 IP minion01.49782 > master01.4506: Flags [.], ack 12, win 49640, length 0
12:27:11.607133 IP minion01.49782 > master01.4506: Flags [P.], seq 10:12, ack 12, win 49640, length 2
12:27:11.607532 IP minion01.49782 > master01.4506: Flags [.], ack 13, win 49640, length 0
12:27:11.607608 IP minion01.49782 > master01.4506: Flags [P.], seq 12:14, ack 15, win 49640, length 2
12:27:11.611069 IP minion01.49783 > master01.4506: Flags [S], seq 462854740, win 49640, options [mss 1460,nop,wscale 0,nop,nop,sackOK], length 0

所以看起来通信突然中断,但为什么呢?测试模块的某些功能是如何工作的,而另一些功能却没有?

先谢谢。任何线索/提示都将不胜感激。

更新:如果我执行带有超时标志-t的salt cmd.script命令,它确实有效。但是,在许多其他情况下,我不需要这个选项。在这种情况和成功案例之间观察到的主要区别是没有以下调试消息:

[DEBUG   ] Checking whether jid 20161019054212008948 is still running

即使minion配置了自定义keepalive设置:

root@minion01 # cat /salt/etc/minion | grep -v ^# | grep -i keepalive
tcp_keepalive_idle: 60
tcp_keepalive_cnt: 3
tcp_keepalive_intvl: 5

顺便说一句,salt-master(master01)和salt-minion(minion01)之间只有一个网络元素,它是内部防火墙。无处不在将MTU正确设置为1500。

0 个答案:

没有答案