当消费缓慢时,卡夫卡不断进行再平衡

时间:2019-06-12 11:56:18

标签: apache-kafka kafka-consumer-api spring-kafka

我一直在尝试测试我们的Kafka的某些负面情况,其中之一是消费速度很慢。我在Thread.sleep(15000)方法(它是@KafkaListener)中设置了spring-kafka,并将并发设置为3。我有1个主题,1个分区。 我在主题中添加了10条消息并启动了服务。 当3个消费者开始时,他们都达到(Re-)joining group点, 但是只有其中之一(假设为consumer-2)将到达:

Successfully joined group with generation X

并开始慢慢使用这些消息。

(顺便说一句,我使用MANUAL_IMMEDIATE Ack模式,但是即使我没有将Acknowledgement参数添加到侦听器并且不确认消息也可以重现)。 接下来,我看到的是: 直到所有消息都被Consumer-2处理,每3秒(默认的心跳间隔)我就在控制台中收到一条消息:

AbstractCoordinator$HeartbeatResponseHandler: [Consumer clientId=consumer-2, groupId=pixel-group] Attempt to heartbeat failed since group is rebalancing

我想知道为什么会这样。只有在处理完所有10条消息之后,才会进行另一次重新平衡,之后所有3个使用者都将打印:

Successfully joined group with generation X

其中一个将被分配一个分区,并且将不再有心跳问题。 仅当我将睡眠间隔设置为高于心跳间隔的值时,才会发生这种情况。通常,所有消费者都开始使用时会发生一次,但很快就会成功建立。

因此,总而言之:

如果消费者处理时间>的心跳间隔时间-除第一个消费者以外的所有消费者都无法完成重新平衡(他们可能无法与慢的领导者交谈)。 我无法理解的是,为什么此心跳错误如此持续? 如果睡眠时间长于心跳,为什么其余的消费者为什么不能在Leader的消息消费之间完成某种平衡?

更新 Kafka版本2.12-2.2.0 Spring-Kafka 2.2.3发布

1 个答案:

答案 0 :(得分:0)

  

...并发到3。我有1个主题,带1个分区...

您至少需要与使用者一样多的分区-一个分区只能由一个使用者使用。

您使用的是哪个版本?自KIP-62(Kafka 0.10.1.0)起,心跳由kafka客户端在后台发送。因此,仅当侦听器花费的时间超过2019-06-13 09:47:52.008 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Heartbeat thread started ... 2019-06-13 09:47:52.072 INFO 61914 --- [ main] com.example.Rbgh664Application : Sleeping for 15 2019-06-13 09:47:55.120 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Sending Heartbeat request to coordinator localhost:9092 (id: 2147483647 rack: null) 2019-06-13 09:47:55.121 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Sending HEARTBEAT {group_id=rbgh664,generation_id=45,member_id=source-82297cee-063a-4e0d-89d0-15cfbd6ef680} with correlation id 10 to node 2147483647 2019-06-13 09:47:55.226 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Completed receive from node 2147483647 for HEARTBEAT with correlation id 10, received {throttle_time_ms=0,error_code=0} 2019-06-13 09:47:55.227 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Received successful Heartbeat response 2019-06-13 09:47:58.120 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Sending Heartbeat request to coordinator localhost:9092 (id: 2147483647 rack: null) 2019-06-13 09:47:58.120 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Sending HEARTBEAT {group_id=rbgh664,generation_id=45,member_id=source-82297cee-063a-4e0d-89d0-15cfbd6ef680} with correlation id 11 to node 2147483647 2019-06-13 09:47:58.225 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Completed receive from node 2147483647 for HEARTBEAT with correlation id 11, received {throttle_time_ms=0,error_code=0} 2019-06-13 09:47:58.226 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Received successful Heartbeat response 2019-06-13 09:48:01.203 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Sending Heartbeat request to coordinator localhost:9092 (id: 2147483647 rack: null) 2019-06-13 09:48:01.204 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Sending HEARTBEAT {group_id=rbgh664,generation_id=45,member_id=source-82297cee-063a-4e0d-89d0-15cfbd6ef680} with correlation id 12 to node 2147483647 2019-06-13 09:48:01.310 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Completed receive from node 2147483647 for HEARTBEAT with correlation id 12, received {throttle_time_ms=0,error_code=0} 2019-06-13 09:48:01.310 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Received successful Heartbeat response 2019-06-13 09:48:04.285 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Sending Heartbeat request to coordinator localhost:9092 (id: 2147483647 rack: null) 2019-06-13 09:48:04.286 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Sending HEARTBEAT {group_id=rbgh664,generation_id=45,member_id=source-82297cee-063a-4e0d-89d0-15cfbd6ef680} with correlation id 13 to node 2147483647 2019-06-13 09:48:04.390 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Completed receive from node 2147483647 for HEARTBEAT with correlation id 13, received {throttle_time_ms=0,error_code=0} 2019-06-13 09:48:04.390 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Received successful Heartbeat response 20 ... 时,才发生重新平衡。 Google KIP-62了解更多信息。

编辑

当监听器处于睡眠状态时,您应该会看到这样的日志...

    fastboot getvar product