Kafka LEADER_NOT_AVAILABLE,即使主题有领导者

时间:2018-03-16 09:00:53

标签: java apache-kafka

我们有3个Kafka(1.0.0)节点,一个主题有4个分区和3个副本。该主题通常如下所示:

Topic:MissionControlTopic   PartitionCount:4    ReplicationFactor:3 Configs:
Topic: MissionControlTopic  Partition: 0    Leader: 0   Replicas: 0,1,2 Isr: 2,1,0
Topic: MissionControlTopic  Partition: 1    Leader: 1   Replicas: 1,2,0 Isr: 2,1,0
Topic: MissionControlTopic  Partition: 2    Leader: 2   Replicas: 2,0,1 Isr: 2,1,0
Topic: MissionControlTopic  Partition: 3    Leader: 0   Replicas: 0,2,1 Isr: 2,1,0

每隔一段时间,节点0停止响应(这是 问题,但不是 问题)。当发生这种情况时,其他两个节点正确地接管其分区,主题如下所示:

Topic:MissionControlTopic   PartitionCount:4    ReplicationFactor:3 Configs:
Topic: MissionControlTopic  Partition: 0    Leader: 1   Replicas: 0,1,2 Isr: 2,1
Topic: MissionControlTopic  Partition: 1    Leader: 1   Replicas: 1,2,0 Isr: 2,1
Topic: MissionControlTopic  Partition: 2    Leader: 2   Replicas: 2,0,1 Isr: 2,1
Topic: MissionControlTopic  Partition: 3    Leader: 2   Replicas: 0,2,1 Isr: 2,1

此时,大多数(但不是全部)生产者和消费者无法写入/读取Kafka并继续记录LEADER_NOT_AVAILABLE例外(第一期)。一旦节点0恢复并且领导者已经重新平衡,应用程序仍然会记录异常(第二期)。只有在应用程序重新启动后,它们才会重新连接并开始正常工作。正如您可能想象的那样,每当Kafka节点出现问题时重启所有应用程序都是不切实际的。

我不确定这里有什么信息可用于尝试解决此问题。我们已经搜索了互联网以获取信息,但我们没有发现任何迹象表明我们的配置有任何明显的错误。我甚至在本地重现了这个问题,并且一旦节点恢复,应用程序就会重新正确连接。

这是写给Kafka的代码:

Properties properties = new Properties();
properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaUrl);
properties.put(ProducerConfig.ACKS_CONFIG, "all");
properties.put(ProducerConfig.RETRIES_CONFIG, 0);
properties.put(ProducerConfig.LINGER_MS_CONFIG, 10);
properties.put(ProducerConfig.MAX_BLOCK_MS_CONFIG, 10000);
properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.getCanonicalName());
properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, GenericEventSerializer.getCanonicalName());

kafkaProducer = new KafkaProducer<>(properties);

// And at some later point...
kafkaProducer.send(new ProducerRecord<>(TOPIC, event), (metadata, exception) -> {
    if (exception != null)
    {
        LOGGER.error("Failed to write to Kafka", exception);
    }
});

这是从中读取的代码:

Properties props = new Properties();
props.put("enable.auto.commit", false);
props.put("bootstrap.servers", kafkaHostString);
props.put("group.id", consumerGroupId);
props.put("request.timeout.ms", 15000);
props.put("session.timeout.ms", 10000);
props.put("max.poll.records", 10000);
props.put("batch.size", 6400000);

Consumer<String, GenericEvent> consumer = new KafkaConsumer<>(props, new StringDeserializer(), new GenericEventDeserializer());
consumer.subscribe(Collections.singleton(topic));

// And at some later point ...
records = consumer.poll(pollTimeout);
consumer.commitSync();

advertised.host.nameadvertised.portadvertised.listeners都在server.properties中设置。

0 个答案:

没有答案