我们在每个节点上使用Kafka和Zookeeper在3节点设置中运行Kafka。主题有1个分区和2个副本,如:
Topic:someTopic PartitionCount:1 ReplicationFactor:2 Configs:retention.ms=600000
Topic: someTopic Partition: 0 Leader: 2 Replicas: 2,0 Isr: 2,0
我们使用以下设置
消费者设置:
fetch.min.bytes=1
enable.auto.commit=true
max.partition.fetch.bytes=1073741824
制作人设置:
metadata.fetch.timeout.ms=1000
如果我们在一个节点上使用'kill -9'停止Kafka和Zookeeper,Kafka会在几秒钟内检测到领导者丢失,并将领导者切换到另一个副本,消费者将继续接收消息。
如果我们另一方面使用'ifdown eth0'关闭同一节点的网络(这将破坏与该节点上的Kafka和Zookeeper的连接),看起来Kafka在检测到代理丢失时遇到问题最多需要2分钟,直到可以在受影响的主题上消费更多消息。
在消费者身上可以看到以下日志:
[2017-05-04 15:44:26,916] WARN Auto offset commit failed for group console-consumer-75510: Commit offsets failed with retriable exception. You should retry committing offsets. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
并在制片人身上:
May 04 15:44:18: 15:44:18.420 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: 15:44:18.435 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: 15:44:18.440 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: 15:44:18.442 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: 15:44:18.444 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'someTopic' failed
May 04 15:44:18: org.apache.kafka.common.errors.NetworkException: The server disconnected before a response was received.
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch containing 31 record(s) expired due to timeout while requesting metadata from brokers for someTopic-0
May 04 15:44:18: 15:44:18.446 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch containing 31 record(s) expired due to timeout while requesting metadata from brokers for someTopic-0
May 04 15:44:18: 15:44:18.448 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
May 04 15:44:18: org.apache.kafka.common.errors.TimeoutException: Batch containing 31 record(s) expired due to timeout while requesting metadata from brokers for someTopic-0
May 04 15:44:18: 15:44:18.449 [kafka-producer-network-thread | producer-2] ERROR - app Publishing to topic 'Heartbeat.Heartbeat' failed
...将继续打印那些
有没有办法让Kafka在节点因网络丢失而关闭时检测并重新平衡,好像Kafka刚被杀掉一样