我正在使用kafka 0.8.2&我的一个kafka服务器死了(无法恢复磁盘上的数据)。复制为1的主题与死服务器上的一个分区有关。我认为重新分配会将该分区的元数据移动到新服务器而不需要数据,但重新分配将停留在in progress
。
我跑了:
$ /opt/kafka/kafka/bin/kafka-reassign-partitions.sh --zookeeper myzookeeper.my.com --reassignment-json-file new_assignment.json --verify
Status of partition reassignment:
Reassignment of partition [topicX,1] is still in progress
这永远不会成功,因为死服务器永远不会回来。
在新服务器的日志中,我看到:
[2015-05-28 06:25:15,401] INFO Completed load of log topicX-1 with log end offset 0 (kafka.log.Log)
[2015-05-28 06:25:15,402] INFO Created log for partition [topicX,1] in /mnt2/data/kafka with properties {segment.index.bytes -> 10485760, file.delete.delay.ms -> 60000, segment.bytes -> 536870912, flush.ms -> 9223372036854775807, delete.retention.ms -> 86400000, index.interval.bytes -> 4096, retention.bytes -> -1, min.insync.replicas -> 1, cleanup.policy -> delete, unclean.leader.election.enable -> true, segment.ms -> 604800000, max.message.bytes -> 1000012, flush.messages -> 9223372036854775807, min.cleanable.dirty.ratio -> 0.5, retention.ms -> 259200000, segment.jitter.ms -> 0}. (kafka.log.LogManager)
[2015-05-28 06:25:15,403] WARN Partition [topicX,1] on broker 4151132: No checkpointed highwatermark is found for partition [topicX,1] (kafka.cluster.Partition)
[2015-05-28 06:25:15,405] INFO [ReplicaFetcherManager on broker 4151132] Removed fetcher for partitions (kafka.server.ReplicaFetcherManager)
[2015-05-28 06:25:15,408] INFO [ReplicaFetcherManager on broker 4151132] Added fetcher for partitions List() (kafka.server.ReplicaFetcherManager)
[2015-05-28 06:25:15,411] INFO [ReplicaFetcherManager on broker 4151132] Removed fetcher for partitions (kafka.server.ReplicaFetcherManager)
[2015-05-28 06:25:15,413] INFO [ReplicaFetcherManager on broker 4151132] Added fetcher for partitions List() (kafka.server.ReplicaFetcherManager)
有没有办法强制它完成或中止重新分配行动?
答案 0 :(得分:4)
您可以通过使用zookeeper shell删除zookeeper群集上的“/ admin / reassign_partitions”zk节点来中止分配,并将分配给dead broker的分区移动到新节点。
答案 1 :(得分:1)
使用 kafka 0.8.2.2 ,只是为了确认Foo L'的答案,重新启动具有相同代理ID的另一台机器解决了这个问题。
虽然我们没有这个具有相同代理ID的新代理,但迁移暂停,verify
命令将始终给出相同的答案
./bin/kafka-reassign-partitions.sh --zookeeper "$ZK_SERVERS" --broker-list "$BROKERS_ID" --reassignment-json-file reassignment.json --verify
结果:
Reassignment of partition [topicName,partitionId] is still in progress
答案 2 :(得分:1)
您可以在所有Kafka节点上启用不干净的领导者选举,然后重新分配应正确完成。
我成功地将这个过程应用于了4个Kafka节点和3个zookeeper节点的集群,主题为__consumer_offsets
,其副本节点已被停用。
unclean.leader.election.enable=true
并重新启动集群中的每个Kafka节点completed successfully
的每次重新分配时使用'--verify'标志进行检查unclean.leader.election.enable=false
并重新启动所有Kafka节点