Question

我遇到了与described here和here相同的问题。我已经尝试过在这两种情况下起作用的一切都无济于事 - 我仍然看到同样的行为。有人可以提供我可能会尝试的替代方案吗？

我的设置：

我正在运行3 Centos 7.2盒子。在所有计算机上运行的网络时间协议（ntpd）。所有人都被更新了。以下是一些详细信息：

Linux version 3.10.0-327.28.2.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.3 20140911 (Red Hat 4.8.3-9) (GCC) )

Docker版本：

# docker version
Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        
 OS/Arch:      linux/amd64

设置swarm管理器：

>docker swarm init --advertise-addr 10.1.1.40:2377 --force-new-cluster
// on some retry attempts (after 'docker swarm leave --force') I ran:
>docker swarm init --advertise-addr 10.1.1.40:2377 --force-new-cluster

经理状态：

>docker node inspect self
[
{
    "ID": "3x5q1n9v956g3ptdle2eve856",
    "Version": {
        "Index": 10
    },
    "CreatedAt": "2016-08-27T13:01:13.400345797Z",
    "UpdatedAt": "2016-08-27T13:01:13.580143388Z",
    "Spec": {
        "Role": "manager",
        "Availability": "active"
    },
    "Description": {
        "Hostname": "mymanagerhost.mycompany.com",
        "Platform": {
            "Architecture": "x86_64",
            "OS": "linux"
        },
        "Resources": {
            "NanoCPUs": 4000000000,
            "MemoryBytes": 16659128320
        },
        "Engine": {
            "EngineVersion": "1.12.1",
            "Plugins": [
                {
                    "Type": "Network",
                    "Name": "bridge"
                },
                {
                    "Type": "Network",
                    "Name": "host"
                },
                {
                    "Type": "Network",
                    "Name": "null"
                },
                {
                    "Type": "Network",
                    "Name": "overlay"
                },
                {
                    "Type": "Volume",
                    "Name": "local"
                }
            ]
        }
    },
    "Status": {
        "State": "ready"
    },
    "ManagerStatus": {
        "Leader": true,
        "Reachability": "reachable",
        "Addr": "10.1.1.40:2377"
    }
}
]

在工作节点上（我有两个，但它们的行为都相同）。

加入Swarm：

>docker swarm join     --token SWMTKN-1-4fjh7kncdpwjvxnxisamhldgenmmnqyvhnx9qdi8d4hkkfuacv-168gs9okd5ck0r4lokdgpef92     10.1.1.40:2377

Error response from daemon: Timeout was reached before node was joined. Attempt to join the cluster will continue in the background. Use "docker info" command to see the current swarm status of your node.

Docker info命令的输出：

>docker info
Plugins:
 Volume: local
 Network: null host bridge overlay
Swarm: pending
 NodeID: 
 Error: rpc error: code = 1 desc = context canceled
 Is Manager: false
 Node Address: 10.1.1.50
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 3.10.0-327.28.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.52 GiB
Name: myWorkerNode.mycompany.com
ID: DAWE:VDRA:ZUVS:P7PH:ADCP:MFNU:2LOS:C6TG:XSIS:Y7EX:I46S:KFXT
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Insecure Registries:
 127.0.0.0/8

编辑下面的第一个答案

所以我尝试离开并停止/启动周围的命令。我做了：

# docker swarm leave --force
Node left the swarm.
# service docker stop
Redirecting to /bin/systemctl stop  docker.service
# 
# service docker start
Redirecting to /bin/systemctl start  docker.service

# docker swarm init --advertise-addr 10.1.1.40:2377
Swarm initialized: current node (0e0y2k2hngnwyeg86ilzbrjmu) is now a manager.

To add a worker to this swarm, run the following command:
docker swarm join \
    --token SWMTKN-1-2ggj60tnbppgjlg63a58oe5pqtv0vfrpj81hheawanf76x7cjc-7v48qak22wd03y3jyv903a9if \
10.1.1.40:2377

然后对我做的工人：

# docker swarm leave
Node left the swarm.
# service docker stop
Redirecting to /bin/systemctl stop  docker.service
# service docker start
Redirecting to /bin/systemctl start  docker.service
# docker swarm join \
>     --token SWMTKN-1-2ggj60tnbppgjlg63a58oe5pqtv0vfrpj81hheawanf76x7cjc-    7v48qak22wd03y3jyv903a9if \
    >     10.1.1.40:2377
Error response from daemon: Timeout was reached before node was joined.     Attempt to join the cluster will continue in the background. Use "docker info" command to see the current swarm status of your node.

这显然是同样的行为......

更新

我已经尝试了@Miad Abrin概述的所有步骤。我仍然有同样的行为。我猜这个原因与我在做的时候看到的CERTS错误有关：

# journalctl -xe
Aug 29 12:26:15 dockerd[6577]: time="2016-08-29T12:26:15.554904435-04:00" level=warning msg="failed to retrieve remote root CA certificate: rpc
Aug 29 12:26:15 dockerd[6577]: time="2016-08-29T12:26:15.555400400-04:00" level=warning msg="failed to retrieve remote root CA certificate: rpc
Aug 29 12:26:15 dockerd[6577]: time="2016-08-29T12:26:15.555478782-04:00" level=warning msg="failed to retrieve remote root CA certificate: rpc
Aug 29 12:26:15 dockerd[6577]: time="2016-08-29T12:26:15.555528929-04:00" level=warning msg="failed to retrieve remote root CA certificate: rpc
Aug 29 12:26:15 dockerd[6577]: time="2016-08-29T12:26:15.555685464-04:00" level=warning msg="failed to retrieve remote root CA certificate: rpc

有谁知道这个的原因以及如何纠正？

Answer 1

您需要在离开群组之前及之后重新启动docker守护程序服务。为群体领导者和作品做这件事。这是1.12版本中的一个错误，由于我遇到了同样的问题，它已在1.12.1中修复。

尝试此操作时的结果

在下面的两个部分中，我使用（num）编号，以显示工人和经理之间的顺序：

关于工人：

(1)# docker swarm leave --force
Error response from daemon: This node is not part of a swarm
(2)# service docker stop
Redirecting to /bin/systemctl stop  docker.service
(6)# service docker start
Redirecting to /bin/systemctl start  docker.service
# 
(7)# docker swarm join \
>     --token SWMTKN-1-4gsdy8jshxmd58mvpcm0tlmbbnrrqdrf51ggcwvdv0bvkltxmy-am9o4dsl4ovx6b4lbsabn0fc7 \
>     10.1.1.40:2377
Error response from daemon: Timeout was reached before node was joined. The attempt to join the swarm will continue in the background.     Use the "docker info" command to see the current swarm status of your node.
(8)# nmap -p2377 10.1.1.40

Starting Nmap 6.40 ( http://nmap.org ) at 2016-08-29 10:32 EDT
Nmap scan report for (10.1.0.123)
Host is up (0.00085s latency).
PORT     STATE    SERVICE
2377/tcp filtered unknown
MAC Address: 00:50:56:B9:76:32

在经理节点上：

(3)# docker swarm leave --force
Error response from daemon: This node is not part of a swarm
(4)# service docker stop
Redirecting to /bin/systemctl stop  docker.service
(5)# service docker start
Redirecting to /bin/systemctl start  docker.service
(7)# docker swarm init --advertise-addr 10.1.1.40 --force-new-cluster
Swarm initialized: current node (7z52d3bcoiou61ltgike42dnn) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join \
    --token SWMTKN-1-4gsdy8jshxmd58mvpcm0tlmbbnrrqdrf51ggcwvdv0bvkltxmy-am9o4dsl4ovx6b4lbsabn0fc7 \
    10.1.1.40:2377

Docker 1.12.1：swarm init之后，worker无法加入swarm

1 个答案: