环境:6个服务器泊坞群群集(2个主人和4个工作人员)
要求:我们需要在现有的docker swarm上设置zookeeper群集。
阻止:要在群集中设置zookeeper,我们需要在每个服务器配置中提供所有zk服务器,并在myid文件中提供唯一ID。
问题:当我们在docker swarm中创建zookeeper的副本时,我们如何为每个副本提供唯一的ID。另外,我们如何使用每个zookeeper容器的ID更新zoo.cfg配置文件。
答案 0 :(得分:8)
目前这不是一个简单的问题。当每个集群成员都需要唯一的标识和存储卷时,完全可扩展的有状态应用程序集群很棘手。
在Docker Swarm上,今天,最好建议您在撰写文件中将每个集群成员作为单独的服务运行(参见31z4/zookeeper-docker):
version: '2'
services:
zoo1:
image: 31z4/zookeeper
restart: always
ports:
- 2181:2181
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
zoo2:
image: 31z4/zookeeper
restart: always
ports:
- 2182:2181
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
..
..
对于最先进(但仍在不断发展)的解决方案,我建议您查看Kubernetes:
Statefulsets的新概念提供了许多希望。我希望Docker Swarm能够及时增加类似功能,为每个容器实例分配一个唯一的“粘性”主机名,可以作为唯一标识符的基础。
答案 1 :(得分:2)
我们已经创建了一个扩展官方图像的docker图像。 entrypoint.sh
已被修改,以便在每个容器启动时,它会自动发现其余的zookeeper节点并适当地配置当前节点。
您可以在docker store和我们的github中找到该图片。
注意:目前它不处理重新创建容器导致失败等情况。
最新图像支持在以下情况下重新配置zookeeper群集:
答案 2 :(得分:0)
我一直在尝试以docker swarm模式部署Zookeeper集群。
我已经部署了3台连接到docker swarm网络的机器。我的要求是,尝试在每个节点上运行3个Zookeeper实例,形成一个整体。 经历了这个线程,对如何在docker swarm中部署Zookeeper的见解很少。
按照@junius的建议,我创建了docker compose文件。 我已经删除了约束,因为docker swarm忽略了它。请参阅https://forums.docker.com/t/docker-swarm-constraints-being-ignored/31555
我的Zookeeper docker撰写文件如下
version: '3.3'
services:
zoo1:
image: zookeeper:3.4.12
hostname: zoo1
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- net
deploy:
restart_policy:
condition: on-failure
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
volumes:
- /home/zk/data:/data
- /home/zk/datalog:/datalog
- /etc/localtime:/etc/localtime:ro
zoo2:
image: zookeeper:3.4.12
hostname: zoo2
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- net
deploy:
restart_policy:
condition: on-failure
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
volumes:
- /home/zk/data:/data
- /home/zk/datalog:/datalog
- /etc/localtime:/etc/localtime:ro
zoo3:
image: zookeeper:3.4.12
hostname: zoo3
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- net
deploy:
restart_policy:
condition: on-failure
environment:
ZOO_MY_ID: 3
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
volumes:
- /home/zk/data:/data
- /home/zk/datalog:/datalog
- /etc/localtime:/etc/localtime:ro
networks:
net:
使用docker stack命令部署。
docker stack deploy -c zoo3.yml zk 创建网络zk_net 创建服务zk_zoo3 创建服务zk_zoo1 创建服务zk_zoo2
Zookeeper服务运行良好,每个节点中的每个服务都没有问题。
docker stack services zk ID名称模式副本图像端口 rn7t5f3tu0r4 zk_zoo1已复制1/1 zookeeper:3.4.12 0.0.0.0:2181->2181/tcp、0.0.0.0:2888->2888/tcp、0.0.0.0:3888->3888/tcp u51r7bjwwm03 zk_zoo2已复制1/1 zookeeper:3.4.12 0.0.0.0:2181->2181/tcp、0.0.0.0:2888->2888/tcp、0.0.0.0:3888->3888/tcp zlbcocid57xz zk_zoo3复制了1/1 zookeeper:3.4.12 0.0.0.0:2181-> 2181 / tcp,0.0.0.0:2888-> 2888 / tcp,0.0.0.0:3888-> 3888 / tcp
当我停止并再次启动Zookeeper堆栈时,我已重现了此处讨论的问题。
docker stack rm zk docker stack deploy -c zoo3.yml zk
这次没有形成Zookeeper集群。 Docker实例记录了以下内容
ZooKeeper JMX enabled by default
Using config: /conf/zoo.cfg
2018-11-02 15:24:41,531 [myid:2] - WARN [WorkerSender[myid=2]:QuorumCnxManager@584] - Cannot open channel to 1 at election address zoo1/10.0.0.4:3888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435)
at java.lang.Thread.run(Thread.java:748)
2018-11-02 15:24:41,538 [myid:2] - WARN [WorkerSender[myid=2]:QuorumCnxManager@584] - Cannot open channel to 3 at election address zoo3/10.0.0.2:3888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:558)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:534)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:454)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:435)
at java.lang.Thread.run(Thread.java:748)
2018-11-02 15:38:19,146 [myid:2] - WARN [QuorumPeer[myid=2]/0.0.0.0:2181:Learner@237] - Unexpected exception, tries=1, connecting to /0.0.0.0:2888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:229)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)
2018-11-02 15:38:20,147 [myid:2] - WARN [QuorumPeer[myid=2]/0.0.0.0:2181:Learner@237] - Unexpected exception, tries=2, connecting to /0.0.0.0:2888
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:229)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:981)
仔细观察发现,当我第一次部署此堆栈时,标识为2的ZooKeeper实例在节点1上运行。这创建了一个值为2的myid文件。
cat / home / zk / data / myid 2
当我停止并再次启动堆栈时,我发现这次ID为3的ZooKeeper实例在节点1上运行。
docker ps 容器ID图像命令创建的状态端口名称 566b68c11c8b zookeeper:3.4.12“ / docker-entrypoin ...” 6分钟前向上6分钟0.0.0.0:2181->2181/tcp,0.0.0.0:2888->2888/tcp,0.0.0.0:3888-> 3888 / tcp zk_zoo3.1.7m0hq684pkmyrm09zmictc5bm
但是myid文件仍然具有值2,这是由较早的实例设置的。
因为该日志显示[myid:2],并且它尝试连接到ID为1和3的实例,但失败了。
在进一步调试中发现docker-entrypoint.sh文件包含以下代码
# Write myid only if it doesn't exist
if [[ ! -f "$ZOO_DATA_DIR/myid" ]]; then
echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"
fi
这对我造成了问题。我用以下命令编辑了docker-entrypoint.sh,
if [[ -f "$ZOO_DATA_DIR/myid" ]]; then
rm "$ZOO_DATA_DIR/myid"
fi
echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"
然后将docker-entrypoint.sh挂载到我的撰写文件中。
有了此修复程序,我能够多次停止和启动堆栈,并且每次我的Zookeeper集群能够形成整体而不会遇到连接问题。
我的docker-entrypoint.sh文件如下
#!/bin/bash
set -e
# Allow the container to be started with `--user`
if [[ "$1" = 'zkServer.sh' && "$(id -u)" = '0' ]]; then
chown -R "$ZOO_USER" "$ZOO_DATA_DIR" "$ZOO_DATA_LOG_DIR"
exec su-exec "$ZOO_USER" "$0" "$@"
fi
# Generate the config only if it doesn't exist
if [[ ! -f "$ZOO_CONF_DIR/zoo.cfg" ]]; then
CONFIG="$ZOO_CONF_DIR/zoo.cfg"
echo "clientPort=$ZOO_PORT" >> "$CONFIG"
echo "dataDir=$ZOO_DATA_DIR" >> "$CONFIG"
echo "dataLogDir=$ZOO_DATA_LOG_DIR" >> "$CONFIG"
echo "tickTime=$ZOO_TICK_TIME" >> "$CONFIG"
echo "initLimit=$ZOO_INIT_LIMIT" >> "$CONFIG"
echo "syncLimit=$ZOO_SYNC_LIMIT" >> "$CONFIG"
echo "maxClientCnxns=$ZOO_MAX_CLIENT_CNXNS" >> "$CONFIG"
for server in $ZOO_SERVERS; do
echo "$server" >> "$CONFIG"
done
fi
if [[ -f "$ZOO_DATA_DIR/myid" ]]; then
rm "$ZOO_DATA_DIR/myid"
fi
echo "${ZOO_MY_ID:-1}" > "$ZOO_DATA_DIR/myid"
exec "$@"
我的docker撰写文件如下
version: '3.3'
services:
zoo1:
image: zookeeper:3.4.12
hostname: zoo1
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- net
deploy:
restart_policy:
condition: on-failure
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=0.0.0.0:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
volumes:
- /home/zk/data:/data
- /home/zk/datalog:/datalog
- /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
- /etc/localtime:/etc/localtime:ro
zoo2:
image: zookeeper:3.4.12
hostname: zoo2
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- net
deploy:
restart_policy:
condition: on-failure
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=0.0.0.0:2888:3888 server.3=zoo3:2888:3888
volumes:
- /home/zk/data:/data
- /home/zk/datalog:/datalog
- /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
- /etc/localtime:/etc/localtime:ro
zoo3:
image: zookeeper:3.4.12
hostname: zoo3
ports:
- target: 2181
published: 2181
protocol: tcp
mode: host
- target: 2888
published: 2888
protocol: tcp
mode: host
- target: 3888
published: 3888
protocol: tcp
mode: host
networks:
- net
deploy:
restart_policy:
condition: on-failure
environment:
ZOO_MY_ID: 3
ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=0.0.0.0:2888:3888
volumes:
- /home/zk/data:/data
- /home/zk/datalog:/datalog
- /home/zk/docker-entrypoint.sh:/docker-entrypoint.sh
- /etc/localtime:/etc/localtime:ro
networks:
net:
有了这个,我能够使用swarm模式启动Zookeeper实例并在docker中运行,而无需在撰写文件中硬编码任何主机名。如果我的一个节点发生故障,服务将在swarm上的任何可用节点上启动,而不会出现任何问题。
谢谢