我使用docker swarm(17.06 CE)来编排我的微服务。群集群有3名经理和1名工人。
我在全局的群组管理器中运行Nginx图像。我有一个基于Java的微服务,在同一个覆盖网络中有2个副本。
现在我发现其中一个Nginx容器无法访问微服务。其他两个Nginx容器可以毫无问题地访问服务。
### there are three nginx containers in swarm
➜ ~ docker service ps pilipa-prod-nginx
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
qufld0uu8tk9 pilipa-prod-nginx.4r2p0t892qn55n4uewoymxbp0 registry.i-counting.cn/pilipa/prod/nginx:latest node02 Running Running 21 hours ago
bwjw9c9dm8e1 pilipa-prod-nginx.ixw4urfkdcnkm326vgkw92x8n registry.i-counting.cn/pilipa/prod/nginx:latest node01 Running Running 21 hours ago
2w2gg83xt6g4 pilipa-prod-nginx.5t63dl8dcj603iyw5l5vv0xvx registry.i-counting.cn/pilipa/prod/nginx:latest node03 Running Running 21 hours ago
### log in the normal Nginx, it can access the micro service without problem
➜ ~ docker exec --interactive --tty pilipa-prod-nginx.4r2p0t892qn55n4uewoymxbp0.qufld0uu8tk9ieubcimed8fgw
sh / # ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever 10901: eth0@if10902: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP
link/ether 02:42:0a:00:00:2c brd ff:ff:ff:ff:ff:ff
inet 10.0.0.44/24 scope global eth0
valid_lft forever preferred_lft forever
inet 10.0.0.11/32 scope global eth0
valid_lft forever preferred_lft forever 10903: eth1@if10904: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
link/ether 02:42:ac:13:00:09 brd ff:ff:ff:ff:ff:ff
inet 172.19.0.9/16 scope global eth1
valid_lft forever preferred_lft forever
/ # wget 10.0.0.71:8080 Connecting to 10.0.0.71:8080 (10.0.0.71:8080) wget: server returned error: HTTP/1.1 401 Unauthorized
### log in the problematic Nginx container which can ping the host of micro service, but can NOT access the service
➜ ~ docker exec --interactive --tty pilipa-prod-nginx.ixw4urfkdcnkm326vgkw92x8n.bwjw9c9dm8e1qlx64z5sniw7h sh
/ #
/ #
/ # wget 10.0.0.71:80
Connecting to 10.0.0.71:80 (10.0.0.71:80)
wget: can't connect to remote host (10.0.0.71): Connection refused
/ # ping 10.0.0.71
PING 10.0.0.71 (10.0.0.71): 56 data bytes
64 bytes from 10.0.0.71: seq=0 ttl=64 time=0.066 ms
64 bytes from 10.0.0.71: seq=1 ttl=64 time=0.076 ms
64 bytes from 10.0.0.71: seq=2 ttl=64 time=0.073 ms
^C
--- 10.0.0.71 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.066/0.071/0.076 ms
Upate:
我尝试使用tcpdump
来捕获微服务主机中的流量。在使用ping 10.0.0.71
和wget 10.0.0.71:8080
访问服务时,我可以从正常的Nginx容器中捕获流量。但是,有问题的Nginx容器中没有捕获ping
或wget
的流量!
这是一个已知的群体覆盖网络错误或我的环境中的错误配置?