Kafka暂停,因为不允许主题截断日志关闭kafka节点

时间:2018-08-27 10:06:18

标签: apache-kafka

由于我们的Kafka节点每隔几天就会出现一次错误,因此我们在设置kafka时遇到了问题。

 Halting because log truncation is not allowed for topic __consumer_offsets, 
Current leader 11's latest offset 123 is less than replica 13's latest offset 234 . 

错误日志中每次都会提到新主题。 我们有3个Kafka节点和3个zookeeper节点。 请问是什么导致了此问题,以及如何解决此问题。

这是检查此错误的代码

 /**
 * Unclean leader election: A follower goes down, in the meanwhile the leader keeps appending messages. The follower comes back up
 * and before it has completely caught up with the leader's logs, all replicas in the ISR go down. The follower is now uncleanly
 * elected as the new leader, and it starts appending messages from the client. The old leader comes back up, becomes a follower
 * and it may discover that the current leader's end offset is behind its own end offset.
 *
 * In such a case, truncate the current follower's log to the current leader's end offset and continue fetching.
 *
 * There is a potential for a mismatch between the logs of the two replicas here. We don't fix this mismatch as of now.
 */
val leaderEndOffset: Long = earliestOrLatestOffset(topicPartition, ListOffsetRequest.LATEST_TIMESTAMP)

if (leaderEndOffset < replica.logEndOffset.messageOffset) {
  // Prior to truncating the follower's log, ensure that doing so is not disallowed by the configuration for unclean leader election.
  // This situation could only happen if the unclean election configuration for a topic changes while a replica is down. Otherwise,
  // we should never encounter this situation since a non-ISR leader cannot be elected if disallowed by the broker configuration.
  if (!LogConfig.fromProps(brokerConfig.originals, AdminUtils.fetchEntityConfig(replicaMgr.zkUtils,
    ConfigType.Topic, topicPartition.topic)).uncleanLeaderElectionEnable) {
    // Log a fatal error and shutdown the broker to ensure that data loss does not occur unexpectedly.
    fatal(s"Exiting because log truncation is not allowed for partition $topicPartition, current leader " +
      s"${sourceBroker.id}'s latest offset $leaderEndOffset is less than replica ${brokerConfig.brokerId}'s latest " +
      s"offset ${replica.logEndOffset.messageOffset}")
    throw new FatalExitError
  }

谢谢

1 个答案:

答案 0 :(得分:1)

这发生在0.10.0上,甚至发生在min.insync.replicas=2上。

分区的负责人在提交自己之前会先写信给关注者(特别是对于acks=all之类的主题,例如__consumer_offsets)。当发生短暂的网络中断时,跟随者可能会很快恢复,并且在将消息写入领导之前,副本会由于领导选举不干净而停止。这是known issue,已固定在0.11.0上。

一种可能的解决方案是为unclean.leader.election.enable=true之类的主题设置__consumer_offsets,然后重新启动代理。根据{{​​3}},

  

unclean.leader.election.enable :指示是否将不在ISR集中的副本启用作为最后的选择,使其成为领导者,   即使这样做可能会导致数据丢失。

当代理崩溃时,控制器将切换领导者分区,该控制器还将在ISR中选择一个副本作为分区领导者。如果没有副本可用,那么您将无法对该分区进行写入或读取。 通过将unlcean.leader.election.enable设置为true,即使第一个可用副本也不在ISR中,它将被选作分区领导者,因此,某些消息可能会丢失!

但是,为了解决此问题,我建议升级到更稳定的版本(如果您仍在使用0.10.0)。