我有一个包含3个节点的zookeeper集群。 Zookeeper配置如下所述。重新启动它时显示成功消息,但它显示状态为失败。
zoo.cfg
dataDir=/ngs/app/<app>/zookeeper-3.4.6/zookeeperdata/1
clientPort=2181
initLimit=5
syncLimit=2
server.1=pr2-ligerp-lapp27.<domain.com>:2888:3888
server.2=pr2-ligerp-lapp28.<domain.com>:2889:3889
server.3=pr2-ligerp-lapp29.<domain.com>:2890:3890
请查看以下日志:
sh zkServer.sh start
JMX enabled by default
Using config: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
-bash-4.1$
-bash-4.1$ cat zookeeper.out
2017-04-18 18:58:13,840 [myid:] - INFO [main:QuorumPeerConfig@103] - Reading configuration from: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
2017-04-18 18:58:13,843 [myid:] - INFO [main:QuorumPeerConfig@340] - Defaulting to majority quorums
2017-04-18 18:58:13,845 [myid:1] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
2017-04-18 18:58:13,845 [myid:1] - INFO [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
2017-04-18 18:58:13,846 [myid:1] - INFO [main:DatadirCleanupManager@101] - Purge task is not scheduled.
2017-04-18 18:58:13,854 [myid:1] - INFO [main:QuorumPeerMain@127] - Starting quorum peer
2017-04-18 18:58:13,861 [myid:1] - INFO [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:2181
2017-04-18 18:58:13,875 [myid:1] - INFO [main:QuorumPeer@959] - tickTime set to 3000
2017-04-18 18:58:13,875 [myid:1] - INFO [main:QuorumPeer@979] - minSessionTimeout set to -1
2017-04-18 18:58:13,875 [myid:1] - INFO [main:QuorumPeer@990] - maxSessionTimeout set to -1
2017-04-18 18:58:13,875 [myid:1] - INFO [main:QuorumPeer@1005] - initLimit set to 5
2017-04-18 18:58:13,884 [myid:1] - INFO [main:FileSnap@83] - Reading snapshot /ngs/app/ligerp/solr/zookeeper-3.4.6/zookeeperdata/1/version-2/snapshot.1300000032
2017-04-18 18:58:13,954 [myid:1] - INFO [Thread-1:QuorumCnxManager$Listener@504] - My election bind port: pr2-ligerp-lapp27.<domain>/10.136.145.38:3888
2017-04-18 18:58:13,960 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@714] - LOOKING
2017-04-18 18:58:13,961 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@815] - New election. My id = 1, proposed zxid=0x130000024b
2017-04-18 18:58:13,962 [myid:1] - INFO [WorkerReceiver[myid=1]:FastLeaderElection@597] - Notification: 1 (message format version), 1 (n.leader), 0x130000024b (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x13 (n.peerEpoch) LOOKING (my state)
2017-04-18 18:58:13,964 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:13,964 [myid:1] - INFO [WorkerSender[myid=1]:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:14,165 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:14,166 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:14,166 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 400
2017-04-18 18:58:15,566 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:15,567 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:15,567 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 800
2017-04-18 18:58:16,368 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:16,368 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:16,368 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 1600
2017-04-18 18:58:17,969 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (2, 1)
2017-04-18 18:58:17,969 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@193] - Have smaller server identifier, so dropping the connection: (3, 1)
2017-04-18 18:58:17,970 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@849] - Notification time out: 3200
但是在检查状态后,我们发现它没有运行,但我们仍然可以看到进程ID。有人可以帮助解决这个问题。
sh zkServer.sh status
JMX enabled by default
Using config: /ngs/app/ligerp/solr/zookeeper-3.4.6/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
这是zookeeper的领导节点中的例外:
2017-04-18 18:25:32,634 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:47916
2017-04-18 18:25:32,635 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn@827] - Processing srvr command from /127.0.0.1:47916
2017-04-18 18:25:32,635 [myid:3] - INFO [Thread-22:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:47916 (no session established for client)
2017-04-18 18:30:01,662 [myid:3] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker@780] - Connection broken for id 1, my id = 3, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:765)
2017-04-18 18:30:01,663 [myid:3] - WARN [RecvWorker:1:QuorumCnxManager$RecvWorker@783] - Interrupting SendWorker
2017-04-18 18:30:01,662 [myid:3] - ERROR [LearnerHandler-/10.136.145.38:47656:LearnerHandler@633] - Unexpected exception causing shutdown while sock still open
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:546)
2017-04-18 18:30:01,663 [myid:3] - WARN [SendWorker:1:QuorumCnxManager$SendWorker@697] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:849)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:64)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:685)
2017-04-18 18:30:01,663 [myid:3] - WARN [LearnerHandler-/10.136.145.38:47656:LearnerHandler@646] - ******* GOODBYE /10.136.145.38:47656 ********
2017-04-18 18:30:01,663 [myid:3] - WARN [SendWorker:1:QuorumCnxManager$SendWorker@706] - Send worker leaving thread
2017-04-18 18:39:40,076 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxnFactory@197] - Accepted socket connection from /10.136.145.38:58748
2017-04-18 18:39:40,077 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-04-18 18:39:40,078 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn@1007] - Closed socket connection for client /10.136.145.38:58748 (no session established for client)
2017-04-18 18:42:46,516 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:47988
2017-04-18 18:42:46,516 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2183:NIOServerCnxn@827] - Processing srvr command from /127.0.0.1:47988
2017-04-18 18:42:46,517 [myid:3] - INFO [Thread-23:NIOServerCnxn@1007] - Closed socket connection for client /127.0.0.1:47988 (no session established for client)
答案 0 :(得分:0)
通过在所有zookeeper节点的zoo.cfg中进行以下更改来修复此问题:
示例zoo.cfg如下所示:
\t