我一直在尝试测试我们的Kafka的某些负面情况,其中之一是消费速度很慢。我在Thread.sleep(15000)
方法(它是@KafkaListener
)中设置了spring-kafka
,并将并发设置为3。我有1个主题,1个分区。
我在主题中添加了10条消息并启动了服务。
当3个消费者开始时,他们都达到(Re-)joining group
点,
但是只有其中之一(假设为consumer-2
)将到达:
Successfully joined group with generation X
并开始慢慢使用这些消息。
(顺便说一句,我使用MANUAL_IMMEDIATE Ack模式,但是即使我没有将Acknowledgement
参数添加到侦听器并且不确认消息也可以重现)。
接下来,我看到的是:
直到所有消息都被Consumer-2处理,每3秒(默认的心跳间隔)我就在控制台中收到一条消息:
AbstractCoordinator$HeartbeatResponseHandler: [Consumer clientId=consumer-2, groupId=pixel-group] Attempt to heartbeat failed since group is rebalancing
我想知道为什么会这样。只有在处理完所有10条消息之后,才会进行另一次重新平衡,之后所有3个使用者都将打印:
Successfully joined group with generation X
其中一个将被分配一个分区,并且将不再有心跳问题。 仅当我将睡眠间隔设置为高于心跳间隔的值时,才会发生这种情况。通常,所有消费者都开始使用时会发生一次,但很快就会成功建立。
因此,总而言之:
如果消费者处理时间>
的心跳间隔时间-除第一个消费者以外的所有消费者都无法完成重新平衡(他们可能无法与慢的领导者交谈)。
我无法理解的是,为什么此心跳错误如此持续?
如果睡眠时间长于心跳,为什么其余的消费者为什么不能在Leader的消息消费之间完成某种平衡?
更新 Kafka版本2.12-2.2.0 Spring-Kafka 2.2.3发布
答案 0 :(得分:0)
...并发到3。我有1个主题,带1个分区...
您至少需要与使用者一样多的分区-一个分区只能由一个使用者使用。
您使用的是哪个版本?自KIP-62(Kafka 0.10.1.0)起,心跳由kafka客户端在后台发送。因此,仅当侦听器花费的时间超过2019-06-13 09:47:52.008 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Heartbeat thread started
...
2019-06-13 09:47:52.072 INFO 61914 --- [ main] com.example.Rbgh664Application : Sleeping for 15
2019-06-13 09:47:55.120 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Sending Heartbeat request to coordinator localhost:9092 (id: 2147483647 rack: null)
2019-06-13 09:47:55.121 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Sending HEARTBEAT {group_id=rbgh664,generation_id=45,member_id=source-82297cee-063a-4e0d-89d0-15cfbd6ef680} with correlation id 10 to node 2147483647
2019-06-13 09:47:55.226 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Completed receive from node 2147483647 for HEARTBEAT with correlation id 10, received {throttle_time_ms=0,error_code=0}
2019-06-13 09:47:55.227 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Received successful Heartbeat response
2019-06-13 09:47:58.120 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Sending Heartbeat request to coordinator localhost:9092 (id: 2147483647 rack: null)
2019-06-13 09:47:58.120 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Sending HEARTBEAT {group_id=rbgh664,generation_id=45,member_id=source-82297cee-063a-4e0d-89d0-15cfbd6ef680} with correlation id 11 to node 2147483647
2019-06-13 09:47:58.225 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Completed receive from node 2147483647 for HEARTBEAT with correlation id 11, received {throttle_time_ms=0,error_code=0}
2019-06-13 09:47:58.226 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Received successful Heartbeat response
2019-06-13 09:48:01.203 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Sending Heartbeat request to coordinator localhost:9092 (id: 2147483647 rack: null)
2019-06-13 09:48:01.204 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Sending HEARTBEAT {group_id=rbgh664,generation_id=45,member_id=source-82297cee-063a-4e0d-89d0-15cfbd6ef680} with correlation id 12 to node 2147483647
2019-06-13 09:48:01.310 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Completed receive from node 2147483647 for HEARTBEAT with correlation id 12, received {throttle_time_ms=0,error_code=0}
2019-06-13 09:48:01.310 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Received successful Heartbeat response
2019-06-13 09:48:04.285 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Sending Heartbeat request to coordinator localhost:9092 (id: 2147483647 rack: null)
2019-06-13 09:48:04.286 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Sending HEARTBEAT {group_id=rbgh664,generation_id=45,member_id=source-82297cee-063a-4e0d-89d0-15cfbd6ef680} with correlation id 13 to node 2147483647
2019-06-13 09:48:04.390 TRACE 61914 --- [hread | rbgh664] org.apache.kafka.clients.NetworkClient : [Consumer clientId=source, groupId=rbgh664] Completed receive from node 2147483647 for HEARTBEAT with correlation id 13, received {throttle_time_ms=0,error_code=0}
2019-06-13 09:48:04.390 DEBUG 61914 --- [hread | rbgh664] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=source, groupId=rbgh664] Received successful Heartbeat response
20
...
时,才发生重新平衡。 Google KIP-62了解更多信息。
编辑
当监听器处于睡眠状态时,您应该会看到这样的日志...
fastboot getvar product