如何在docker swarm上设置zookeeper集群

时间:2017-02-06 07:42:32

标签: docker apache-zookeeper docker-swarm

环境:6个服务器泊坞群群集(2个主人和4个工作人员)

要求:我们需要在现有的docker swarm上设置zookeeper群集。

阻止:要在群集中设置zookeeper,我们需要在每个服务器配置中提供所有zk服务器,并在myid文件中提供唯一ID。

问题:当我们在docker swarm中创建zookeeper的副本时,我们如何为每个副本提供唯一的ID。另外,我们如何使用每个zookeeper容器的ID更新zoo.cfg配置文件。

3 个答案:

答案 0 :(得分:8)

目前这不是一个简单的问题。当每个集群成员都需要唯一的标识和存储卷时,完全可扩展的有状态应用程序集群很棘手。

在Docker Swarm上,今天,最好建议您在撰写文件中将每个集群成员作为单独的服务运行(参见31z4/zookeeper-docker):

version: '2'
services:
    zoo1:
        image: 31z4/zookeeper
        restart: always
        ports:
            - 2181:2181
        environment:
            ZOO_MY_ID: 1
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888

    zoo2:
        image: 31z4/zookeeper
        restart: always
        ports:
            - 2182:2181
        environment:
            ZOO_MY_ID: 2
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
..
..

对于最先进(但仍在不断发展)的解决方案,我建议您查看Kubernetes:

Statefulsets的新概念提供了许多希望。我希望Docker Swarm能够及时增加类似功能,为每个容器实例分配一个唯一的“粘性”主机名,可以作为唯一标识符的基础。

答案 1 :(得分:2)

我们已经创建了一个扩展官方图像的docker图像。 entrypoint.sh已被修改,以便在每个容器启动时,它会自动发现其余的zookeeper节点并适当地配置当前节点。

您可以在docker store和我们的github中找到该图片。

注意:目前它不处理重新创建容器导致失败等情况。

编辑(6/11/2018)

最新图像支持在以下情况下重新配置zookeeper群集:

  • 扩展docker服务(添加更多容器)
  • 缩小docker服务(删除容器)
  • 通过docker swarm重新安排容器导致故障(分配新IP)

答案 2 :(得分:0)

我一直在尝试以docker swarm模式部署Zookeeper集群。

我已经部署了3台连接到docker swarm网络的机器。我的要求是,尝试在每个节点上运行3个Zookeeper实例,形成一个整体。 经历了这个线程,对如何在docker swarm中部署Zookeeper的见解很少。

按照@junius的建议,我创建了docker compose文件。 我已经删除了约束,因为docker swarm忽略了它。请参阅https://forums.docker.com/t/docker-swarm-constraints-being-ignored/31555

我的Zookeeper docker撰写文件如下

version: '3.3'

services:
    zoo1:
        image: zookeeper:3.4.12
        hostname: zoo1
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 1
            ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /etc/localtime:/etc/localtime:ro
    zoo2:
        image: zookeeper:3.4.12
        hostname: zoo2
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 2
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /etc/localtime:/etc/localtime:ro
    zoo3:
        image: zookeeper:3.4.12
        hostname: zoo3
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 3
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /etc/localtime:/etc/localtime:ro
networks:
    net:

使用docker stack命令部署。

  

docker stack deploy -c zoo3.yml zk   创建网络zk_net   创建服务zk_zoo3   创建服务zk_zoo1   创建服务zk_zoo2

Zookeeper服务运行良好,每个节点中的每个服务都没有问题。

  

docker stack services zk   ID名称模式副本图像端口   rn7t5f3tu0r4 zk_zoo1已复制1/1 zookeeper:3.4.12 0.0.0.0:2181->2181/tcp、0.0.0.0:2888->2888/tcp、0.0.0.0:3888->3888/tcp   u51r7bjwwm03 zk_zoo2已复制1/1 zookeeper:3.4.12 0.0.0.0:2181->2181/tcp、0.0.0.0:2888->2888/tcp、0.0.0.0:3888->3888/tcp   zlbcocid57xz zk_zoo3复制了1/1 zookeeper:3.4.12 0.0.0.0:2181-> 2181 / tcp,0.0.0.0:2888-> 2888 / tcp,0.0.0.0:3888-> 3888 / tcp

当我停止并再次启动Zookeeper堆栈时,我已重现了此处讨论的问题。

  

docker stack rm zk   docker stack deploy -c zoo3.yml zk

这次没有形成Zookeeper集群。 Docker实例记录了以下内容

ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
2018-11-02 15:24:41,531 [myid:2] - WARN  [WorkerSender[myid=2]:QuorumCnxManager@584] - Cannot open channel to 1 at election address zoo1/10.0.0.4:3888
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435)
        at java.lang.Thread.run(Thread.java:748)
2018-11-02 15:24:41,538 [myid:2] - WARN  [WorkerSender[myid=2]:QuorumCnxManager@584] - Cannot open channel to 3 at election address zoo3/10.0.0.2:3888
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435)
        at java.lang.Thread.run(Thread.java:748)
2018-11-02 15:38:19,146 [myid:2] - WARN  [QuorumPeer[myid=2]/0.0.0.0:2181:Learner@237] - Unexpected exception, tries=1, connecting to /0.0.0.0:2888
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:229)
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)
2018-11-02 15:38:20,147 [myid:2] - WARN  [QuorumPeer[myid=2]/0.0.0.0:2181:Learner@237] - Unexpected exception, tries=2, connecting to /0.0.0.0:2888
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:229)
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)

仔细观察发现,当我第一次部署此堆栈时,标识为2的ZooKeeper实例在节点1上运行。这创建了一个值为2的myid文件。

  

cat / home / zk / data / myid   2

当我停止并再次启动堆栈时,我发现这次ID为3的ZooKeeper实例在节点1上运行。

  

docker ps   容器ID图像命令创建的状态端口名称   566b68c11c8b zookeeper:3.4.12“ / docker-entrypoin ...” 6分钟前向上6分钟0.0.0.0:2181->2181/tcp,0.0.0.0:2888->2888/tcp,0.0.0.0:3888-> 3888 / tcp zk_zoo3.1.7m0hq684pkmyrm09zmictc5bm

但是myid文件仍然具有值2,这是由较早的实例设置的。

因为该日志显示[myid:2],并且它尝试连接到ID为1和3的实例,但失败了。

在进一步调试中发现docker-entrypoint.sh文件包含以下代码

# Write myid only if it doesn't exist
if [[ ! -f "$ZOO_DATA_DIR/myid" ]]; then
    echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"
fi

这对我造成了问题。我用以下命令编辑了docker-entrypoint.sh,

if [[ -f "$ZOO_DATA_DIR/myid" ]]; then
    rm "$ZOO_DATA_DIR/myid"
fi

echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"

然后将docker-entrypoint.sh挂载到我的撰写文件中。

有了此修复程序,我能够多次停止和启动堆栈,并且每次我的Zookeeper集群能够形成整体而不会遇到连接问题。

我的docker-entrypoint.sh文件如下

#!/bin/bash

set -e

# Allow the container to be started with `--user`
if [[ "$1" = 'zkServer.sh' && "$(id -u)" = '0' ]]; then
    chown -R "$ZOO_USER" "$ZOO_DATA_DIR" "$ZOO_DATA_LOG_DIR"
    exec su-exec "$ZOO_USER" "$0" "$@"
fi

# Generate the config only if it doesn't exist
if [[ ! -f "$ZOO_CONF_DIR/zoo.cfg" ]]; then
    CONFIG="$ZOO_CONF_DIR/zoo.cfg"

    echo "clientPort=$ZOO_PORT" >> "$CONFIG"
    echo "dataDir=$ZOO_DATA_DIR" >> "$CONFIG"
    echo "dataLogDir=$ZOO_DATA_LOG_DIR" >> "$CONFIG"

    echo "tickTime=$ZOO_TICK_TIME" >> "$CONFIG"
    echo "initLimit=$ZOO_INIT_LIMIT" >> "$CONFIG"
    echo "syncLimit=$ZOO_SYNC_LIMIT" >> "$CONFIG"

    echo "maxClientCnxns=$ZOO_MAX_CLIENT_CNXNS" >> "$CONFIG"

    for server in $ZOO_SERVERS; do
        echo "$server" >> "$CONFIG"
    done
fi

if [[ -f "$ZOO_DATA_DIR/myid" ]]; then
    rm "$ZOO_DATA_DIR/myid"
fi

echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"

exec "$@"

我的docker撰写文件如下

version: '3.3'

services:
    zoo1:
        image: zookeeper:3.4.12
        hostname: zoo1
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 1
            ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
            - /etc/localtime:/etc/localtime:ro
    zoo2:
        image: zookeeper:3.4.12
        hostname: zoo2
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 2
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
            - /etc/localtime:/etc/localtime:ro
    zoo3:
        image: zookeeper:3.4.12
        hostname: zoo3
        ports:
            - target: 2181
              published: 2181
              protocol: tcp
              mode: host
            - target: 2888
              published: 2888
              protocol: tcp
              mode: host
            - target: 3888
              published: 3888
              protocol: tcp
              mode: host
        networks:
            - net
        deploy:
            restart_policy:
                condition: on-failure
        environment:
            ZOO_MY_ID: 3
            ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
        volumes:
            - /home/zk/data:/data
            - /home/zk/datalog:/datalog
            - /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
            - /etc/localtime:/etc/localtime:ro
networks:
    net:

有了这个,我能够使用swarm模式启动Zookeeper实例并在docker中运行,而无需在撰写文件中硬编码任何主机名。如果我的一个节点发生故障,服务将在swarm上的任何可用节点上启动,而不会出现任何问题。

谢谢