快速检测丢失网络上丢失的Kafka节点

时间:2017-05-09 08:47:30

标签: apache-kafka

我们在每个节点上使用Kafka和Zookeeper在3节点设置中运行Kafka。主题有1个分区和2个副本,如:

Topic:someTopic    PartitionCount:1    ReplicationFactor:2    Configs:retention.ms=600000
    Topic: someTopic    Partition: 0    Leader: 2    Replicas: 2,0    Isr: 2,0

我们使用以下设置

消费者设置:

fetch.min.bytes=1
enable.auto.commit=true
max.partition.fetch.bytes=1073741824

制作人设置:

metadata.fetch.timeout.ms=1000

如果我们在一个节点上使用'kill -9'停止Kafka和Zookeeper,Kafka会在几秒钟内检测到领导者丢失,并将领导者切换到另一个副本,消费者将继续接收消息。

如果我们另一方面使用'ifdown eth0'关闭同一节点的网络(这将破坏与该节点上的Kafka和Zookeeper的连接),看起来Kafka在检测到代理丢失时遇到问题最多需要2分钟,直到可以在受影响的主题上消费更多消息。

在消费者身上可以看到以下日志:

[2017-05-04 15:44:26,916] WARN Auto offset commit failed for group console-consumer-75510: Commit offsets failed with retriable exception. You should retry committing offsets. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

并在制片人身上:

May 04 15:44:18: 15:44:18.420 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: 15:44:18.435 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: 15:44:18.440 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: 15:44:18.442 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: 15:44:18.444 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch containing 31 record(s) expired due to timeout while requesting metadata from brokers for someTopic-0
May 04 15:44:18: 15:44:18.446 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch containing 31 record(s) expired due to timeout while requesting metadata from brokers for someTopic-0
May 04 15:44:18: 15:44:18.448 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch containing 31 record(s) expired due to timeout while requesting metadata from brokers for someTopic-0
May 04 15:44:18: 15:44:18.449 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed

...将继续打印那些

有没有办法让Kafka在节点因网络丢失而关闭时检测并重新平衡,好像Kafka刚被杀掉一样

0 个答案:

没有答案