在Windows上重启后,Docker服务任务停留在准备状态

时间:2019-09-26 21:50:21

标签: windows docker containers docker-swarm windows-container

重新启动作为集群工作人员的Windows服务器后,一旦服务器和docker守护程序重新联机,Windows容器将无限期地陷入“准备”状态。

处于准备状态的任务/容器的图像:

https://user-images.githubusercontent.com/4528753/65180353-4e5d6e80-da22-11e9-8060-451150865177.png

重现该问题的步骤:

  1. 创建一个集群(在我的情况下,我有CentOS7管理器和一些Windows服务器1903工人)
  2. 创建仅在Windows计算机上运行的“全局”泊坞窗服务。他们应该开始很好 最初工作正常。
  3. 从第2步中排出一个或多个运行Windows容器的Windows节点(泊坞窗节点更新--availability = drain节点名)
  4. 重新启动在步骤3中耗尽的一个或多个节点,等待它们恢复正常
  5. 将Windows节点设置回活动状态(docker节点更新--availability =活动节点名)
  6. 在这一点上,只需观察到在步骤2中创建的docker服务将“准备”容器以在这些节点上启动,并且它将在那里停留(docker service ps servicename --no-trunc)-您可以观察到这一点并从任何主节点运行这些命令
memberlist: Refuting a suspect message (from: c9347e85405d)
memberlist: Failed to send ping: write udp 10.60.3.40:7946->10.60.3.110:7946: wsasendto: The requested address is not valid in its
          context.
grpc: addrConn.createTransport failed to connect to {10.60.3.110:2377 0  <nil>}. Err :connection error: desc = "transport: Error while
          dialing dial tcp 10.60.3.110:2377: connectex: A socket operation was attempted to an unreachable host.". Reconnecting... [module=grpc]
memberlist: Failed to send ping: write udp 10.60.3.40:7946->10.60.3.186:7946: wsasendto: The requested address is not valid in its
          context.
grpc: addrConn.createTransport failed to connect to {10.60.3.110:2377 0  <nil>}. Err :connection error: desc = "transport: Error while
          dialing dial tcp 10.60.3.110:2377: connectex: A socket operation was attempted to an unreachable host.". Reconnecting... [module=grpc]
agent: session failed [node.id=wuhifvg9li3v5zuq2xu7c6hxa module=node/agent error=rpc error: code = Unavailable desc = all SubConns are
          in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.60.3.69:2377:
          connectex: A socket operation was attempted to an unreachable host." backoff=6.3s]
Failed to send gossip to 10.60.3.110: write udp 10.60.3.40:7946->10.60.3.110:7946: wsasendto: The requested address is not valid in its
          context.
Failed to send gossip to 10.60.3.69: write udp 10.60.3.40:7946->10.60.3.69:7946: wsasendto: The requested address is not valid in its
          context.
Failed to send gossip to 10.60.3.105: write udp 10.60.3.40:7946->10.60.3.105:7946: wsasendto: The requested address is not valid in its
          context.
Failed to send gossip to 10.60.3.69: write udp 10.60.3.40:7946->10.60.3.69:7946: wsasendto: The requested address is not valid in its
          context.
Failed to send gossip to 10.60.3.186: write udp 10.60.3.40:7946->10.60.3.186:7946: wsasendto: The requested address is not valid in its
          context.
Failed to send gossip to 10.60.3.105: write udp 10.60.3.40:7946->10.60.3.105:7946: wsasendto: The requested address is not valid in its
          context.
Failed to send gossip to 10.60.3.186: write udp 10.60.3.40:7946->10.60.3.186:7946: wsasendto: The requested address is not valid in its
          context.
Failed to send gossip to 10.60.3.69: write udp 10.60.3.40:7946->10.60.3.69:7946: wsasendto: The requested address is not valid in its
          context.
Failed to send gossip to 10.60.3.105: write udp 10.60.3.40:7946->10.60.3.105:7946: wsasendto: The requested address is not valid in its
          context.
Failed to send gossip to 10.60.3.109: write udp 10.60.3.40:7946->10.60.3.109:7946: wsasendto: The requested address is not valid in its
          context.
Failed to send gossip to 10.60.3.69: write udp 10.60.3.40:7946->10.60.3.69:7946: wsasendto: The requested address is not valid in its
          context.
Failed to send gossip to 10.60.3.110: write udp 10.60.3.40:7946->10.60.3.110:7946: wsasendto: The requested address is not valid in its
          context.
memberlist: Failed to send gossip to 10.60.3.105:7946: write udp 10.60.3.40:7946->10.60.3.105:7946: wsasendto: The requested address is
          not valid in its context.
memberlist: Failed to send gossip to 10.60.3.186:7946: write udp 10.60.3.40:7946->10.60.3.186:7946: wsasendto: The requested address is
          not valid in its context.

例如,许多错误都是奇怪的... 7946在群集节点之间完全开放,telnet确认了这一点。

我希望看到docker服务容器能迅速启动,并且不会停留在“准备”状态。 Docker映像已被拉出,应该很快。

docker版本输出

Client: Docker Engine - Enterprise
 Version:           19.03.2
 API version:       1.40
 Go version:        go1.12.8
 Git commit:        c92ab06ed9
 Built:             09/03/2019 16:38:11
 OS/Arch:           windows/amd64
 Experimental:      false

Server: Docker Engine - Enterprise
 Engine:
  Version:          19.03.2
  API version:      1.40 (minimum version 1.24)
  Go version:       go1.12.8
  Git commit:       c92ab06ed9
  Built:            09/03/2019 16:35:47
  OS/Arch:          windows/amd64
  Experimental:     false

码头工人信息输出

Client:
 Debug Mode: false
 Plugins:
  cluster: Manage Docker clusters (Docker Inc., v1.1.0-8c33de7)

Server:
 Containers: 4
  Running: 0
  Paused: 0
  Stopped: 4
 Images: 4
 Server Version: 19.03.2
 Storage Driver: windowsfilter
  Windows:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics l2bridge l2tunnel nat null overlay transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: active
  NodeID: wuhifvg9li3v5zuq2xu7c6hxa
  Is Manager: false
  Node Address: 10.60.3.40
  Manager Addresses:
   10.60.3.110:2377
   10.60.3.186:2377
   10.60.3.69:2377
 Default Isolation: process
 Kernel Version: 10.0 18362 (18362.1.amd64fre.19h1_release.190318-1202)
 Operating System: Windows Server Datacenter Version 1903 (OS Build 18362.356)
 OSType: windows
 Architecture: x86_64
 CPUs: 4
 Total Memory: 8GiB
 Name: SWARMWORKER1
 ID: V2WJ:OEUM:7TUQ:WPIO:UOK4:IAHA:KWMN:RQFF:CAUO:LUB6:DJIJ:OVBX
 Docker Root Dir: E:\docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: this node is not a swarm manager - check license status on a manager node

其他详细信息

  • 这些节点未将Docker Desktop用于Windows。我主要根据此处的Powershell说明在包装盒上配置docker:https://docs.docker.com/install/windows/docker-ee/
  • Windows防火墙已禁用
  • iptables / firewalld已禁用
  • 集群节点之间的通信是完全开放的
  • 关于累积更新的最新信息

我发布了关于Moby回购的问题,但从未听说过: https://github.com/moby/moby/issues/39955

我发现暂时解决此问题的唯一方法是从群集中清空节点,删除docker文件,重新安装Windows“容器”功能,然后重新加入群集。但是,重新启动后会再次发生。

有趣的是,当我在Windows Worker上看到一个处于“正在准备”状态的群集任务时,服务器似乎根本没有做任何事情,就像经理认为该工人正在准备容器一样,但是不是...

有人有什么建议吗?

0 个答案:

没有答案