描述
我在同一个覆盖网络中的容器之间遇到了一些间歇性的通信问题。几个星期以来,我一直在努力寻找解决方案,但我在谷歌看到的与通信问题相关的一切都与我所看到的完全不符。所以我希望有人可以帮我弄清楚发生了什么。
We are using Docker 17.06
We are using standalone swarm with three masters and one node.
We have multiple overlay networks
连接到每个覆盖网络的容器:
1 container running Apache Tomcat 8.5 and HAproxy 1.7 (called the controller)
1 container just running Apache Tomcat 8.5 (called the apps container)
3 containers running Postgresql 9.6
1 container running an FTP service
1 container running Logstash
重现问题的步骤:
创建新的覆盖网络 附上容器 查看日志,过一会儿就会看到错误
描述您收到的结果:
“controller”每隔几秒钟在“apps”容器上轮询一个servlet。 每隔15分钟左右,我们会在“控制器”的日志文件中看到连接超时错误。而且,当控制器试图在其中一个Postgresql容器中访问其数据库时,我们看到连接尝试失败。
轮询应用容器时出错
org.apache.http.conn.ConnectTimeoutException:连接到srvpln50-webapp_1.0-1:5050 [srvpln50-webapp_1.0-1 / 10.0.1.6]失败:连接超时
尝试连接数据库时出错
JavaException:com.ebasetech.xi.exceptions.FormRuntimeException:使用数据库连接CONTROLLER,SQLEx获取连接时出错 StandardPoolDataSource中的ception:getConnection异常:java.sql.SQLException:StandardPoolDataSource中的SQLException:getConnection无连接可用java.sql.SQLException:不能 获取URL jdbc的连接:postgresql:// srvpln50-controller-db_latest:5432 / ctrldata:连接尝试失败。
我打开了docker deamon节点上的调试模式。
每次发生这些错误时,我都会在docker日志中看到以下相关条目:
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.422797691Z" level=debug msg="Name To resolve: srvpln50-webapp_1.0-1."
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.422905040Z" level=debug msg="Lookup for srvpln50-webapp_1.0-1.: IP [10.0.1.6]"
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.648262289Z" level=debug msg="miss notification: dest IP 10.0.0.3, dest MAC 02:42:0a:00:00:03"
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.716329366Z" level=debug msg="miss notification: dest IP 10.0.0.6, dest MAC 02:42:0a:00:00:06"
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.716952000Z" level=debug msg="miss notification: dest IP 10.0.0.6, dest MAC 02:42:0a:00:00:06"
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.802320875Z" level=debug msg="miss notification: dest IP 10.0.0.3, dest MAC 02:42:0a:00:00:03"
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.944189349Z" level=debug msg="miss notification: dest IP 10.0.0.9, dest MAC 02:42:0a:00:00:09"
Feb 09 14:27:26 swarm-node-1 dockerd[12193]: time="2018-02-09T14:27:26.944770233Z" level=debug msg="miss notification: dest IP 10.0.0.9, dest MAC 02:42:0a:00:00:09"
IP 10.0.0.3 is the "controller" container
IP 10.0.0.6 is the "apps" container
IP 10.0.0.9 is the "postgresql" container that the "controller" is trying to connect to.
描述您期望的结果:
没有连接错误
您认为重要的其他信息(例如偶尔会发生问题):
泊坞窗版本的输出:
客户端:
Version: 17.06.1-ce
API version: 1.30
Go version: go1.8.3
Git commit: 874a737
Built: Thu Aug 17 22:51:12 2017
OS/Arch: linux/amd64
服务器:
Version: 17.06.1-ce
API version: 1.30 (minimum version 1.12)
Go version: go1.8.3
Git commit: 874a737
Built: Thu Aug 17 22:50:04 2017
OS/Arch: linux/amd64
Experimental: false
码头信息输出:
Containers: 19
Running: 19
Paused: 0
Stopped: 0
Images: 18
Server Version: 17.06.1-ce
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 385
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.4.0-108-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.784GiB
Name: swarm-node-1
ID: O5ON:VQE7:IRV6:WCB7:RQO4:RIZ4:XFHE:AUCX:ZLM2:GPZL:DXQO:BCIX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 217
Goroutines: 371
System Time: 2018-02-09T15:50:01.902816981Z
EventsListeners: 2
Registry: https://index.docker.io/v1/
Labels:
name=swarm-node-1
Experimental: false
Cluster Store: etcd://localhost:2379/store
Cluster Advertise: 10.80.120.13:2376
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
其他环境详细信息(AWS,VirtualBox,物理等):
Swarm主机,节点和容器在裸机服务器上运行Ubuntu 16.04
如果我遗漏了任何有助于诊断的内容,请告诉我。
答案 0 :(得分:0)
阅读了Google上Docker人员的许多评论,关于在最新版Docker中修复的许多通信问题,我们已升级到17.12 CE,我们遇到的所有问题都消失了。
很想知道这个问题是什么,但我很高兴看到它们消失了。