Question

由于我们的Kafka节点每隔几天就会出现一次错误，因此我们在设置kafka时遇到了问题。

 Halting because log truncation is not allowed for topic __consumer_offsets, 
Current leader 11's latest offset 123 is less than replica 13's latest offset 234 .

错误日志中每次都会提到新主题。我们有3个Kafka节点和3个zookeeper节点。请问是什么导致了此问题，以及如何解决此问题。

这是检查此错误的代码

 /**
 * Unclean leader election: A follower goes down, in the meanwhile the leader keeps appending messages. The follower comes back up
 * and before it has completely caught up with the leader's logs, all replicas in the ISR go down. The follower is now uncleanly
 * elected as the new leader, and it starts appending messages from the client. The old leader comes back up, becomes a follower
 * and it may discover that the current leader's end offset is behind its own end offset.
 *
 * In such a case, truncate the current follower's log to the current leader's end offset and continue fetching.
 *
 * There is a potential for a mismatch between the logs of the two replicas here. We don't fix this mismatch as of now.
 */
val leaderEndOffset: Long = earliestOrLatestOffset(topicPartition, ListOffsetRequest.LATEST_TIMESTAMP)

if (leaderEndOffset < replica.logEndOffset.messageOffset) {
  // Prior to truncating the follower's log, ensure that doing so is not disallowed by the configuration for unclean leader election.
  // This situation could only happen if the unclean election configuration for a topic changes while a replica is down. Otherwise,
  // we should never encounter this situation since a non-ISR leader cannot be elected if disallowed by the broker configuration.
  if (!LogConfig.fromProps(brokerConfig.originals, AdminUtils.fetchEntityConfig(replicaMgr.zkUtils,
    ConfigType.Topic, topicPartition.topic)).uncleanLeaderElectionEnable) {
    // Log a fatal error and shutdown the broker to ensure that data loss does not occur unexpectedly.
    fatal(s"Exiting because log truncation is not allowed for partition $topicPartition, current leader " +
      s"${sourceBroker.id}'s latest offset $leaderEndOffset is less than replica ${brokerConfig.brokerId}'s latest " +
      s"offset ${replica.logEndOffset.messageOffset}")
    throw new FatalExitError
  }

谢谢

Answer 1

这发生在0.10.0上，甚至发生在min.insync.replicas=2上。

分区的负责人在提交自己之前会先写信给关注者（特别是对于acks=all之类的主题，例如__consumer_offsets）。当发生短暂的网络中断时，跟随者可能会很快恢复，并且在将消息写入领导之前，副本会由于领导选举不干净而停止。这是known issue，已固定在0.11.0上。

一种可能的解决方案是为unclean.leader.election.enable=true之类的主题设置__consumer_offsets，然后重新启动代理。根据{{3}}，

unclean.leader.election.enable ：指示是否将不在ISR集中的副本启用作为最后的选择，使其成为领导者，即使这样做可能会导致数据丢失。

当代理崩溃时，控制器将切换领导者分区，该控制器还将在ISR中选择一个副本作为分区领导者。如果没有副本可用，那么您将无法对该分区进行写入或读取。通过将unlcean.leader.election.enable设置为true，即使第一个可用副本也不在ISR中，它将被选作分区领导者，因此，某些消息可能会丢失！

但是，为了解决此问题，我建议升级到更稳定的版本（如果您仍在使用0.10.0）。

Kafka暂停，因为不允许主题截断日志关闭kafka节点

1 个答案: