从OfflinePartition到OnlinePartition的分区的启动状态更改失败

时间:2018-02-01 13:48:06

标签: apache-kafka

我重新配置了我的kafka群集,改变了:

  • 默认复制因子从1到3以及
  • 更改磁盘上kafka数据目录的位置

因此,在重新启动所有节点后,群集似乎没问题,但后来我注意到所有主题都无法联机。在日志中,每个主题都有这样的消息:

state-change.log: [2018-02-01 12:41:42,176] ERROR Controller 826437096    epoch 19 initiated state change for partition [filedrop,0] from   OfflinePartition to OnlinePartition failed (state.change.logger)

所以没有一个主题可用;使用kafkacat -L -b列出主题显示了不可用的领导者。

Metadata for all topics (from broker -1: lol-045:9092/bootstrap):
7 brokers:
broker 826437096 at lol-044:9092
broker 746155422 at lol-047:9092
broker 651737161 at lol-046:9092
broker 728512596 at lol-048:9092
broker 213763378 at lol-045:9092
broker 622553932 at lol-049:9092
broker 746727274 at lol-050:9092
14 topics:
topic "lol.stripped" with 3 partitions:
 partition 2, leader -1, replicas:, isrs:, Broker: Leader not available
 partition 1, leader -1, replicas:, isrs:, Broker: Leader notavailable
 partition 0, leader -1, replicas:, isrs:, Broker: Leader not available

但是,新创建的主题可以正确复制并且健康

topic "lol-kafka-health" with 3 partitions:
partition 2, leader 622553932, replicas: 622553932,213763378,651737161, isrs: 622553932,213763378,651737161
partition 1, leader 213763378, replicas: 622553932,213763378,826437096, isrs: 213763378,826437096,622553932
partition 0, leader 826437096, replicas: 213763378,746727274,826437096, isrs: 826437096,746727274,213763378

所以我认为在重新配置期间发生了某种元数据损坏

我的问题是:

  • 我有什么方法可以再次在线获取这些主题分区吗?

鉴于:

  • 在重新配置期间更改了代理ID
  • kafka的zookeeper群集在重新配置期间暂时中断了

此外,我是否可以使用一些程序来调查这些主题的可恢复性?

非常感谢提前!

1 个答案:

答案 0 :(得分:0)

这里描述的程序允许我通过新的经纪人ID重新分配无领导的艺术:

https://community.cloudera.com/t5/Data-Ingestion-Integration/Move-partitions-from-invalid-leader/td-p/43334