我有3个成员的kafka集群设置, __ consumer_offsets 主题有50个分区。
以下是describe命令的结果:
root@kafka-cluster-0:~# kafka-topics.sh --zookeeper localhost:2181 --describe
Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
Topic: __consumer_offsets Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 1 Leader: -1 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 2 Leader: 0 Replicas: 0 Isr: 0
Topic: __consumer_offsets Partition: 3 Leader: 1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 4 Leader: -1 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 5 Leader: 0 Replicas: 0 Isr: 0
...
...
成员是节点0、1和2。
很明显, replica = 2 中的分区没有设置引导者,并且它们的 leader = -1
我想知道是什么引起了这个问题,我重新启动了第二个成员kafka服务,但是我从没想过它会产生这种副作用。
现在,所有节点都已经运行了几个小时,这是 ls broker / ids 的结果:
/home/kafka/bin/zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids"
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is disabled
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[0, 1, 2]
此外,集群中有很多主题,节点2 并不是其中的任何一个主题,并且在任何地方它仅具有数据(replication-factor = 1,并且在该主题上托管分区)节点), leader = -1 ,如下所示。
Here, node 2 is in the ISR, but never a leader, since replication-factor=2.
Topic:upstream-t2 PartitionCount:20 ReplicationFactor:2 Configs:retention.ms=172800000,retention.bytes=536870912
Topic: upstream-t2 Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: upstream-t2 Partition: 1 Leader: 0 Replicas: 2,0 Isr: 0
Topic: upstream-t2 Partition: 2 Leader: 0 Replicas: 0,1 Isr: 0
Topic: upstream-t2 Partition: 3 Leader: 0 Replicas: 1,0 Isr: 0
Topic: upstream-t2 Partition: 4 Leader: 1 Replicas: 2,1 Isr: 1,2
Topic: upstream-t2 Partition: 5 Leader: 0 Replicas: 0,2 Isr: 0
Topic: upstream-t2 Partition: 6 Leader: 1 Replicas: 1,2 Isr: 1,2
Here, node 2 is the only partition some chunks of data are hosted on, but leader=-1.
Topic:upstream-t20 PartitionCount:10 ReplicationFactor:1 Configs:retention.ms=172800000,retention.bytes=536870912
Topic: upstream-t20 Partition: 0 Leader: 1 Replicas: 1 Isr: 1
Topic: upstream-t20 Partition: 1 Leader: -1 Replicas: 2 Isr: 2
Topic: upstream-t20 Partition: 2 Leader: 0 Replicas: 0 Isr: 0
Topic: upstream-t20 Partition: 3 Leader: 1 Replicas: 1 Isr: 1
Topic: upstream-t20 Partition: 4 Leader: -1 Replicas: 2 Isr: 2
非常感谢您提供有关如何修复未当选领导人的帮助。
也很高兴知道这可能会对我的经纪人的行为产生任何影响。
编辑---
Kafka版本:1.1.0(2.12-1.1.0) 可用空间,例如800GB的可用磁盘。 日志文件非常正常,在节点2上,下面是日志文件的最后10行。请让我知道我是否有特别需要寻找的东西。
[2018-12-18 10:31:43,828] INFO [Log partition=upstream-t14-1, dir=/var/lib/kafka] Rolled new log segment at offset 79149636 in 2 ms. (kafka.log.Log)
[2018-12-18 10:32:03,622] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6435}, Current: {epoch:8, offset:6386} for Partition: upstream-t41-8. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:32:03,693] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6333}, Current: {epoch:8, offset:6324} for Partition: upstream-t41-3. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:38:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 10:40:04,831] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6354}, Current: {epoch:8, offset:6340} for Partition: upstream-t41-9. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:48:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 10:58:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 11:05:50,770] INFO [ProducerStateManager partition=upstream-t4-17] Writing producer snapshot at offset 3086815 (kafka.log.ProducerStateManager)
[2018-12-18 11:05:50,772] INFO [Log partition=upstream-t4-17, dir=/var/lib/kafka] Rolled new log segment at offset 3086815 in 2 ms. (kafka.log.Log)
[2018-12-18 11:07:16,634] INFO [ProducerStateManager partition=upstream-t4-11] Writing producer snapshot at offset 3086497 (kafka.log.ProducerStateManager)
[2018-12-18 11:07:16,635] INFO [Log partition=upstream-t4-11, dir=/var/lib/kafka] Rolled new log segment at offset 3086497 in 1 ms. (kafka.log.Log)
[2018-12-18 11:08:15,803] INFO [ProducerStateManager partition=upstream-t4-5] Writing producer snapshot at offset 3086616 (kafka.log.ProducerStateManager)
[2018-12-18 11:08:15,804] INFO [Log partition=upstream-t4-5, dir=/var/lib/kafka] Rolled new log segment at offset 3086616 in 1 ms. (kafka.log.Log)
[2018-12-18 11:08:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
编辑2 ----
好吧,我已经停止了 leader zookeeper实例,现在第二个zookeeper实例被选为Leader!这样,未选择的领导者问题现在就可以解决!
虽然我不知道可能出了什么问题,所以任何关于“ 为什么更换动物园管理员的问题都能解决未选中的问题”的想法都非常受欢迎!
谢谢!
答案 0 :(得分:0)
尽管根本原因没有得到确认,但询问者似乎确实找到了解决方案:
我已经停止了头号动物园管理员实例,现在是第二个动物园管理员 实例当选为领导者!这样,未选的领导者 问题已解决!