Redis哨兵在群体模式下检测奴隶问题

时间:2020-02-23 12:52:24

标签: docker redis docker-compose docker-swarm redis-sentinel

我正在尝试通过docker和swarm构建一个简单的redis哨兵演示。

有两个节点:node1(群管理器),node2。 Node1将运行一个redis主节点和一个哨兵,Node2将运行一个redis从节点。

这是我的docker-compose文件(用标签控制容器的分配):

version: "3.3"
services:
        master:
           image: "redis:5.0.7"
           deploy:
                mode: global
                placement:
                        constraints: [node.labels.redismaster == true]
           networks:
                myredisnet:
           command: redis-server /etc/redis.conf
           volumes:
                - "~/redis.conf:/etc/redis.conf"
        salve:
           image: "redis:5.0.7"
           deploy:
                mode: global
                placement:
                        constraints: [node.labels.redisslave1 == true]
           networks:
                myredisnet:
           command: redis-server /etc/redis-slave.conf
           volumes:
                - "~/redis-slave.conf:/etc/redis-slave.conf"
        sentinel:
           image: "redis:5.0.7"
           ports:
                - "26379:26379"
           volumes:
                - "~/sentinel.conf:/usr/local/bin/sentinel.conf"
           deploy:
                mode: global
                placement:
                        constraints: [node.labels.redismaster == true]
           networks:
                myredisnet:
           command: redis-sentinel /usr/local/bin/sentinel.conf
networks:
        myredisnet:
                driver: overlay

我的redis conf文件和redis-slave conf文件类似,除了redis-slave文件中的slaveof master 6379。(master是docker-compose文件中的服务名):

bind 0.0.0.0
protected-mode yes
masterauth redispass
requirepass redispass

这是我的哨兵conf文件:

port 26379
logfile "/var/log/sentinel.log"
protected-mode no
dir "/root"
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster master 6379 1
sentinel auth-pass mymaster redispass

在我用docker stack deploy -c docker-compose.yml redis部署这些服务之后,一切似乎都很正常,并且redis主从服务器已正确构建。

但是哨兵似乎有问题。当我进入哨兵集装箱码头(docker exec -it) 并查看前哨日志:

root@d2fe4dc7ffa4:/data# cat /var/log/sentinel.log 
1:X 23 Feb 2020 11:22:25.114 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 23 Feb 2020 11:22:25.114 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 23 Feb 2020 11:22:25.114 # Configuration loaded
1:X 23 Feb 2020 11:22:25.115 * Running mode=sentinel, port=26379.
1:X 23 Feb 2020 11:22:25.115 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 23 Feb 2020 11:22:25.116 # Sentinel ID is 1f9c8c8f688f0a9925dad749fea86c196781f6bf
1:X 23 Feb 2020 11:22:25.116 # +monitor master mymaster 10.0.9.2 6379 quorum 1
1:X 23 Feb 2020 11:22:25.118 * +slave slave 10.0.9.7:6379 10.0.9.7 6379 @ mymaster 10.0.9.2 6379
1:X 23 Feb 2020 11:22:55.168 # +sdown slave 10.0.9.7:6379 10.0.9.7 6379 @ mymaster 10.0.9.2 6379

如您所见,哨兵认为从节点不可用。让我感到困惑的是,哨兵检测到从站的IP为10.0.9.7。在节点2上。我通过命令发现从属容器的IP应该为10.0.9.6:

on node2:
[root@node02 ~]# docker inspect 4ba57e6fd395
...
"Networks": {
                "redis_myredisnet": {
                    "IPAMConfig": {
                        "IPv4Address": "10.0.9.6"
                    },
                    "Links": null,
                    "Aliases": [
                        "4ba57e6fd395"
                    ],
                    "NetworkID": "ziry6mb6fkz5ido2cg9j86t6a",
                    "EndpointID": "771da42d9d7dc03ecb3892d2c3cdf83be97268625b0ee24f0fa3ffb6c2377b6d",
                    "Gateway": "",
                    "IPAddress": "10.0.9.6",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:0a:00:09:06",
                    "DriverOpts": null
                }
            }


当我进入redis主容器终端(docker exec -it)并执行redis-cliauth redispassinfo来检查node1上的复制信息时:

# Replication
role:master
connected_slaves:1
slave0:ip=10.0.9.7,port=6379,state=online,offset=224628,lag=1
master_replid:f4bf3ba64df96919b6e9cd4e0935ace6d31b0ba6
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:224759
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:224759

如您所见,从属ip也是slave0:ip=10.0.9.7。 所以我做了一个小实验,我使用apt-get update; apt-get install telnet并尝试在我的redis主容器中telnet 10.0.9.7 6379:

root@d2fe4dc7ffa4:/data# telnet 10.0.9.7 6379
Trying 10.0.9.7...
telnet: Unable to connect to remote host: Connection refused

我还测试了telnet 10.0.9.6 6379:

root@d2fe4dc7ffa4:/data# telnet 10.0.9.6 6379
Trying 10.0.9.6...
Connected to 10.0.9.6.
Escape character is '^]'.
auth redispass
+OK

此外,我执行docker inspect (slave service name),这是从属服务VIP:

[root@node03 ~]# docker inspect redis_salve
...
 "Endpoint": {
            "Spec": {
                "Mode": "vip"
            },
            "VirtualIPs": [
                {
                    "NetworkID": "ziry6mb6fkz5ido2cg9j86t6a",
                    "Addr": "10.0.9.5/24"
                }
            ]
        }


那么,这个IP 10.0.9.7是从哪里来的呢? 而且,我的哨兵服务似乎也有问题。当我暂停Redis主容器时,哨兵无法切换到从节点。

此外,这是哨兵服务运行后的哨兵conf文件:

[root@node03 ~]# cat sentinel.conf 
port 26379
logfile "/var/log/sentinel.log"
protected-mode no
dir "/root"
sentinel myid 1f9c8c8f688f0a9925dad749fea86c196781f6bf
sentinel deny-scripts-reconfig yes
# Generated by CONFIG REWRITE
sentinel monitor mymaster 10.0.9.2 6379 1
sentinel auth-pass mymaster redispass
sentinel config-epoch mymaster 0
sentinel leader-epoch mymaster 0
sentinel known-replica mymaster 10.0.9.7 6379
sentinel current-epoch 0

任何帮助将不胜感激!!!!!!!!!

0 个答案:

没有答案