由于我们的Kafka节点每隔几天就会出现一次错误,因此我们在设置kafka时遇到了问题。
Halting because log truncation is not allowed for topic __consumer_offsets,
Current leader 11's latest offset 123 is less than replica 13's latest offset 234 .
错误日志中每次都会提到新主题。 我们有3个Kafka节点和3个zookeeper节点。 请问是什么导致了此问题,以及如何解决此问题。
这是检查此错误的代码
/**
* Unclean leader election: A follower goes down, in the meanwhile the leader keeps appending messages. The follower comes back up
* and before it has completely caught up with the leader's logs, all replicas in the ISR go down. The follower is now uncleanly
* elected as the new leader, and it starts appending messages from the client. The old leader comes back up, becomes a follower
* and it may discover that the current leader's end offset is behind its own end offset.
*
* In such a case, truncate the current follower's log to the current leader's end offset and continue fetching.
*
* There is a potential for a mismatch between the logs of the two replicas here. We don't fix this mismatch as of now.
*/
val leaderEndOffset: Long = earliestOrLatestOffset(topicPartition, ListOffsetRequest.LATEST_TIMESTAMP)
if (leaderEndOffset < replica.logEndOffset.messageOffset) {
// Prior to truncating the follower's log, ensure that doing so is not disallowed by the configuration for unclean leader election.
// This situation could only happen if the unclean election configuration for a topic changes while a replica is down. Otherwise,
// we should never encounter this situation since a non-ISR leader cannot be elected if disallowed by the broker configuration.
if (!LogConfig.fromProps(brokerConfig.originals, AdminUtils.fetchEntityConfig(replicaMgr.zkUtils,
ConfigType.Topic, topicPartition.topic)).uncleanLeaderElectionEnable) {
// Log a fatal error and shutdown the broker to ensure that data loss does not occur unexpectedly.
fatal(s"Exiting because log truncation is not allowed for partition $topicPartition, current leader " +
s"${sourceBroker.id}'s latest offset $leaderEndOffset is less than replica ${brokerConfig.brokerId}'s latest " +
s"offset ${replica.logEndOffset.messageOffset}")
throw new FatalExitError
}
谢谢
答案 0 :(得分:1)
这发生在0.10.0
上,甚至发生在min.insync.replicas=2
上。
分区的负责人在提交自己之前会先写信给关注者(特别是对于acks=all
之类的主题,例如__consumer_offsets
)。当发生短暂的网络中断时,跟随者可能会很快恢复,并且在将消息写入领导之前,副本会由于领导选举不干净而停止。这是known issue,已固定在0.11.0
上。
一种可能的解决方案是为unclean.leader.election.enable=true
之类的主题设置__consumer_offsets
,然后重新启动代理。根据{{3}},
unclean.leader.election.enable
:指示是否将不在ISR集中的副本启用作为最后的选择,使其成为领导者, 即使这样做可能会导致数据丢失。
当代理崩溃时,控制器将切换领导者分区,该控制器还将在ISR中选择一个副本作为分区领导者。如果没有副本可用,那么您将无法对该分区进行写入或读取。
通过将unlcean.leader.election.enable
设置为true
,即使第一个可用副本也不在ISR中,它将被选作分区领导者,因此,某些消息可能会丢失!
但是,为了解决此问题,我建议升级到更稳定的版本(如果您仍在使用0.10.0)。