在上一次维护Kafka的过程中(需要滚动重启kafka经纪人),我们目睹了某些分区的消费者组偏移量的重置。
上午11:14,对于消费者群体来说一切都很好,我们看不到消费者的滞后:
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 0 105130857 105130857 0 st-...
...
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 6 78591770 78591770 0 st-...
但是,五分钟后,在代理程序滚动重启期间,我们对一个分区进行了重置,并且消耗了数百万个事件。
$ bin/kafka-consumer-groups --bootstrap-server XXX:9093,XXX... --command-config secrets.config --group st-xx --describe
Note: This will not show information about old Zookeeper-based consumers.
[2019-08-26 12:44:13,539] WARN Connection to node -5 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2019-08-26 12:44:13,707] WARN [Consumer clientId=consumer-1, groupId=st-xx] Connection to node -5 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Consumer group 'st-xx' has no active members.
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 0 105132096 105132275 179
...
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 6 15239401 78593165 63353764 ...
在过去两个小时中,该分区的偏移量尚未恢复,我们现在需要手动对其进行修补。在上次滚动重新启动代理程序期间,我们遇到了类似的问题。
有人以前见过这样的东西吗?我们唯一可以找到的线索是this ticket,但是我们运行的是Kafka版本:1.0.1-kafka3.1.0,