在三个节点的集群中丢弃一个节点后,zookeeper是否存活?

时间:2016-05-17 09:39:35

标签: apache-kafka apache-zookeeper fault-tolerance

  1. 我看到,这是Zoopekeeper instances in Kafka的类似问题。 但问题仍然没有答案。
  2. 所以我的扩展版本的问题(有更多细节)

    1. 环境: 业务应用程序有3个节点。每个应用程序都包含自己的1个zookeeper和1个kafka嵌入式节点。
    2. 我必须澄清,防止出现令人困惑的问题。我的业务应用程序构建在elasticsearch之上,其中3个节点的minimumMasterNodes = 2, 所以我在集群中的应用程序的容错性是1。 所以我假设,以同样的方式,我可以为每个应用程序提供自己的zookeeper节点和kafka节点的实例。 一般目标是使用容错= 1的kafka mirrormaker在此堆栈之上构建业务应用程序的数据中心间数据复制。

      在我的实验中,我没有使用我的商业应用程序的完整堆栈,但每个应用程序节点内只有zookeeper + kafka。 每个应用程序将其日志输出到控制台,因此我可以确定哪个已经在LEADER模式下启动了zookeeper。

      我的zookeeper ansemble配置是:

      server.1=localhost:2668:3668
      server.2=localhost:2669:3669
      server.3=localhost:2670:3670
      syncLimit=5
      initLimit=10
      clientPort=*  #here each node has its own value of port number: 2182,2183,2184 for servers 1,2,3 accordingly
      dataDir=D:\rtest\3-nodes\data\*\zoo   # * is 1, 2, 3 accordingly to servers 1,2,3
      dataLogDir=D:\rtest\3-nodes\data\*\zoo\log # * is 1, 2, 3 accordingly to servers 1,2,3
      
      1. 我的错误情景是: 2.1。启动所有三个应用节点。启动使用者(控制台输出)。启动应用程序以生成消息序列。确保使用者通过kafka群集接收消息。 2.2。杀死其动物园管理员实例为领导者的应用程序(在我的情况下,它是服务器#3)。 2.3。确保使用者不从kafka主题输出任何新消息。
      2. 从我的观点来看,问题出在动物园管理员身上。 以下是由活动节点1,2生成的日志的摘录: 看起来像动物园管理员服务器正在继续尝试到达掉落的服务器,而不是在他们之间获得有关法定人数的协议...... 顺便说说。在这种情况下,我甚至无法通过控制台clisent连接到zookeeper(更清楚,我可以连接到它,但是在第一个命令时,我们应该说" ls /"控制台客户端因异常而崩溃)

        服务器1:

        15459 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2182] WARN  org.apache.zookeeper.server.quorum.Learner  - Exception when following the leader
        java.net.SocketException: Connection reset
                at java.net.SocketInputStream.read(Unknown Source)
                at java.net.SocketInputStream.read(Unknown Source)
                at java.io.BufferedInputStream.fill(Unknown Source)
                at java.io.BufferedInputStream.read(Unknown Source)
                at java.io.DataInputStream.readInt(Unknown Source)
                at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
                at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
                at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
                at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
                at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
                at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
        15460 [Thread-3-SendThread(127.0.0.1:2184)] WARN  org.apache.zookeeper.ClientCnxn  - Session 0x354b9dbe0b90001 for server 127.0.0.1/127.0.0.1:2184, unexpected error, closing socket connection and attempting reconnect
        java.io.IOException: An existing connection was forcibly closed by the remote host
                at sun.nio.ch.SocketDispatcher.read0(Native Method)
                at sun.nio.ch.SocketDispatcher.read(Unknown Source)
                at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
                at sun.nio.ch.IOUtil.read(Unknown Source)
                at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
                at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
                at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
                at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
        15459 [Thread-3-SendThread(0:0:0:0:0:0:0:1:2184)] WARN  org.apache.zookeeper.ClientCnxn  - Session 0x354b9dbe0b90000 for server 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2184, unexpected error, closing socket connection and attempting reconnect
        java.io.IOException: An existing connection was forcibly closed by the remote host
                at sun.nio.ch.SocketDispatcher.read0(Native Method)
                at sun.nio.ch.SocketDispatcher.read(Unknown Source)
                at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
                at sun.nio.ch.IOUtil.read(Unknown Source)
                at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
                at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
                at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
                at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
        15459 [RecvWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Connection broken for id 3, my id = 1, error = java.net.SocketException: Connection reset
                at java.net.SocketInputStream.read(Unknown Source)
                at java.net.SocketInputStream.read(Unknown Source)
                at java.net.SocketInputStream.read(Unknown Source)
                at java.io.DataInputStream.readInt(Unknown Source)
                at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
        15462 [RecvWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Interrupting SendWorker
        15462 [SendWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Interrupted while waiting for message on queue java.lang.InterruptedException
                at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
                at java.util.concurrent.ArrayBlockingQueue.poll(Unknown Source)
                at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
                at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
                at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
        15462 [SendWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Send worker leaving thread
        15766 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182] WARN  org.apache.zookeeper.server.NIOServerCnxn  - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
        16481 [WorkerSender[myid=1]] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Cannot open channel to 3 at election address localhost/127.0.0.1:3670
        java.net.ConnectException: Connection refused: connect
                at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
                at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
                at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
                at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
                at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
                at java.net.PlainSocketImpl.connect(Unknown Source)
                at java.net.SocksSocketImpl.connect(Unknown Source)
                at java.net.Socket.connect(Unknown Source)
                at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
                at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
                at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
                at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
                at java.lang.Thread.run(Unknown Source)
        16596 [Thread-3-SendThread(127.0.0.1:2184)] WARN  org.apache.zookeeper.ClientCnxn  - Session 0x354b9dbe0b90000 for server null, unexpected error, closing socket connection and attempting reconnect
        java.net.ConnectException: Connection refused: no further information
                at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
                at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
                at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
                at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
        ...
        

        服务器2:

        ...
        5118 [RecvWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Connection broken for id 3, my id = 2, error =
        java.net.SocketException: Connection reset
                at java.net.SocketInputStream.read(Unknown Source)
                at java.net.SocketInputStream.read(Unknown Source)
                at java.net.SocketInputStream.read(Unknown Source)
                at java.io.DataInputStream.readInt(Unknown Source)
                at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
        5121 [RecvWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Interrupting SendWorker
        5120 [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2183] WARN  org.apache.zookeeper.server.quorum.Learner  - Exception when following the leader
        java.net.SocketException: Connection reset
                at java.net.SocketInputStream.read(Unknown Source)
                at java.net.SocketInputStream.read(Unknown Source)
                at java.io.BufferedInputStream.fill(Unknown Source)
                at java.io.BufferedInputStream.read(Unknown Source)
                at java.io.DataInputStream.readInt(Unknown Source)
                at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
                at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
                at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
                at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
                at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
                at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
        5119 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183] WARN  org.apache.zookeeper.server.NIOServerCnxn  - Exception causing close of session 0x254b9dbe0b20000 due to java.io.IOException: An existing connect
        ion was forcibly closed by the remote host
        5122 [SendWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Interrupted while waiting for message on queue
        java.lang.InterruptedException
                at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
                at java.util.concurrent.ArrayBlockingQueue.poll(Unknown Source)
                at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
                at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
                at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
        5123 [SendWorker:3] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Send worker leaving thread
        5536 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183] WARN  org.apache.zookeeper.server.NIOServerCnxn  - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
        6143 [WorkerSender[myid=2]] WARN  org.apache.zookeeper.server.quorum.QuorumCnxManager  - Cannot open channel to 3 at election address localhost/127.0.0.1:3670
        java.net.ConnectException: Connection refused: connect
                at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
                at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
                at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
                at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
                at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
                at java.net.PlainSocketImpl.connect(Unknown Source)
                at java.net.SocksSocketImpl.connect(Unknown Source)
                at java.net.Socket.connect(Unknown Source)
                at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
                at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341)
                at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449)
                at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430)
                at java.lang.Thread.run(Unknown Source)
        ....
        
        顺便说一下。由于我的要求,4个这样的节点的响应是完美的。所有人都可以回答,如果3个节点的zookeeper群集在死亡一个节点后能够存活吗?或者我做错了什么?

1 个答案:

答案 0 :(得分:0)

3个节点的集群可能会丢失1个,5个集群可能会丢失2个。 在这里问了类似的问题: ZooKeeper reliability - three versus five nodes