Redis哨兵故障转移无法正常工作

时间:2018-02-02 09:09:00

标签: caching redis failover

我已经在端口7000,7001和7002(一个主服务器和两个从服务器)上设置了三个服务器的Redis哨兵,并且在同一台机器(ubuntu VM)上的端口26379,26380和26381上设置了三个哨兵。

当我启动它们时,根据日志,一切看起来都很好,当我对哨兵运行INFO命令时,看起来也很健康。但是当我放下主人(让它通过Ctrl + C或redis-cli SLEEP命令停止工作)时,没有任何从属实例作为新主人引入,并且哨兵试图提名并连接到已经死亡的主实例!我的配置如下:

站长:

port 7000      
protected-mode no

奴隶#1:

port 7001
slaveof 10.75.196.216 7000

奴隶#2:

port 7002
slaveof 10.75.196.216 7000

Sentinel#1:

port 26379
protected-mode no

sentinel myid bdddadb6e825065398be0bae214891d7ccbd6e2a
sentinel monitor themaster 10.75.196.216 7000 2
sentinel down-after-milliseconds themaster 3000
sentinel failover-timeout themaster 5000
sentinel parallel-syncs themaster 2
sentinel config-epoch themaster 0

# Generated by CONFIG REWRITE
dir "/home/bob/app/sentinel-test/master"
sentinel leader-epoch themaster 322
sentinel known-slave themaster 10.75.196.216 7002
sentinel known-slave themaster 10.75.196.216 7001
sentinel known-sentinel themaster 10.75.196.216 26380 181fb84351d6b96e0120bfa68331738ef111c49f
sentinel known-sentinel themaster 10.75.196.216 26381 8497ee90c1e4525c0f957407fefa77427f427e0d
sentinel current-epoch 322

Sentinel#2:

port 26380
protected-mode no

sentinel myid 181fb84351d6b96e0120bfa68331738ef111c49f
sentinel monitor themaster 10.75.196.216 7000 2
sentinel down-after-milliseconds themaster 3000
sentinel failover-timeout themaster 5000
sentinel parallel-syncs themaster 2

# Generated by CONFIG REWRITE
dir "/home/bob/app/sentinel-test/slave1"
sentinel config-epoch themaster 0
sentinel leader-epoch themaster 322
sentinel known-slave themaster 10.75.196.216 7001
sentinel known-slave themaster 10.75.196.216 7002
sentinel known-sentinel themaster 10.75.196.216 26381 8497ee90c1e4525c0f957407fefa77427f427e0d
sentinel known-sentinel themaster 10.75.196.216 26379 bdddadb6e825065398be0bae214891d7ccbd6e2a
sentinel current-epoch 322

Sentinel#3:

port 26381
protected-mode no

sentinel myid 8497ee90c1e4525c0f957407fefa77427f427e0d
sentinel monitor themaster 10.75.196.216 7000 2
sentinel down-after-milliseconds themaster 3000
sentinel failover-timeout themaster 5000
sentinel parallel-syncs themaster 2

# Generated by CONFIG REWRITE
dir "/home/bob/app/sentinel-test/slave2"
sentinel config-epoch themaster 0
sentinel leader-epoch themaster 322
sentinel known-slave themaster 10.75.196.216 7001
sentinel known-slave themaster 10.75.196.216 7002
sentinel known-sentinel themaster 10.75.196.216 26379 bdddadb6e825065398be0bae214891d7ccbd6e2a
sentinel known-sentinel themaster 10.75.196.216 26380 181fb84351d6b96e0120bfa68331738ef111c49f
sentinel current-epoch 322

主控制台日志: enter image description here

Sentinel#1控制台日志: enter image description here

Sentinel#1 info命令结果: enter image description here

主人失败后的Sentinel#1日志: enter image description here

我的配置有什么问题?

提前致谢。

1 个答案:

答案 0 :(得分:0)

好的,如果您注意到Sentinel日志,当它启动时,即使在主实例停止工作之前,也会说两个奴隶关闭了:

redis error : +sdown slave

可能这就是为什么没有一个奴隶足够好成为新的主人,并且我们在主人关闭后在sentinel日志中看到 -failover-abort-no-good-slave 错误。

所以,因为我记得我收到了以下错误:

  

(错误)READONLY您不能写入只读从属

当我尝试通过redis-cli将密钥设置为从属节点时,我决定通过在slave配置文件中放置以下行来修复此READONLY错误(两者都有):

  

slave-read-only no

通过修复此部分并重新启动所有内容,再次在sentinel日志中没有出现 + sdown slave 错误,主要问题也已修复。现在,Sentinels可以在主故障事件中切换到新的从属实例。

正如我在互联网上看到的那样,另一个人有类似的 + sdown 问题,但在他的情况下问题是身份验证。

感谢所有分享知识和经验的人。希望这有助于某人。