偏移的异步自动提交失败

时间:2020-09-24 10:08:18

标签: apache-kafka kafka-consumer-api spring-kafka

我对Kafka自动提交机制有疑问。 我正在使用启用了自动提交的Spring-Kafka。 作为实验,我在系统空闲时断开了消费者与Kafka的连接30秒钟(该主题中没有新消息,也没有消息正在处理)。 重新连接后,我收到了一些如下消息:

Asynchronous auto-commit of offsets {cs-1915-2553221872080030-0=OffsetAndMetadata{offset=19, leaderEpoch=0, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.

首先,我不知道该提交什么?系统处于空闲状态(先前的所有消息均已提交)。 其次,断开时间为30秒,远远少于5分钟(300000 ms)的最大轮询间隔ms 第三,在卡夫卡(Kafka)失控的故障中,我至少收到了3万条此类消息,可通过重新启动该过程来解决。为什么会这样?

我在这里列出我的使用者配置:

allow.auto.create.topics = true
        auto.commit.interval.ms = 100
        auto.offset.reset = latest
        bootstrap.servers = [kafka1-eu.dev.com:9094, kafka2-eu.dev.com:9094, kafka3-eu.dev.com:9094]
        check.crcs = true
        client.dns.lookup = default
        client.id =
        client.rack =
        connections.max.idle.ms = 540000
        default.api.timeout.ms = 60000
        enable.auto.commit = true
        exclude.internal.topics = true
        fetch.max.bytes = 52428800
        fetch.max.wait.ms = 500
        fetch.min.bytes = 1
        group.id = feature-cs-1915-2553221872080030
        group.instance.id = null
        heartbeat.interval.ms = 3000
        interceptor.classes = []
        internal.leave.group.on.close = true
        isolation.level = read_uncommitted
        key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
        max.partition.fetch.bytes = 1048576
        max.poll.interval.ms = 300000
        max.poll.records = 500
        metadata.max.age.ms = 300000
        metric.reporters = []
        metrics.num.samples = 2
        metrics.recording.level = INFO
        metrics.sample.window.ms = 30000
        partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
        receive.buffer.bytes = 65536
        reconnect.backoff.max.ms = 1000
        reconnect.backoff.ms = 50
        request.timeout.ms = 30000
        retry.backoff.ms = 100
        sasl.client.callback.handler.class = null
        sasl.jaas.config = null
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.min.time.before.relogin = 60000
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        sasl.kerberos.ticket.renew.window.factor = 0.8
        sasl.login.callback.handler.class = null
        sasl.login.class = null
        sasl.login.refresh.buffer.seconds = 300
        sasl.login.refresh.min.period.seconds = 60
        sasl.login.refresh.window.factor = 0.8
        sasl.login.refresh.window.jitter = 0.05
        sasl.mechanism = GSSAPI
        security.protocol = SSL
        send.buffer.bytes = 131072
        session.timeout.ms = 15000
        ssl.cipher.suites = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.endpoint.identification.algorithm = https
        ssl.key.password = [hidden]
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.location = /home/me/feature-2553221872080030.keystore
        ssl.keystore.password = [hidden]
        ssl.keystore.type = JKS
        ssl.protocol = TLS
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.location = /home/me/feature-2553221872080030.truststore
        ssl.truststore.password = [hidden]
        ssl.truststore.type = JKS
        value.deserializer = class org.springframework.kafka.support.serializer.ErrorHandlingDeserializer2

1 个答案:

答案 0 :(得分:1)

首先,我不知道要提交什么?

您是正确的,如果没有新数据在流动,则没有要提交的内容。但是,启用auto.commit并且您的使用者仍在运行(即使无法连接到代理),轮询方法仍然负责以下步骤:

  • 从分配的分区中获取消息
  • 触发分区分配(如有必要)
  • 如果启用了自动偏移提交,则提交偏移

与您间隔100毫秒(请参阅auto.commit.intervals)一起,使用者仍然尝试异步提交使用者的(不变)偏移位置。

第二,断开时间为30秒,比轮询时间间隔(最大值)5分钟(300000 ms)要短得多。

不是最大轮询间隔导致重新平衡,而是heartbeat.interval.ms设置和session.timeout.ms的组合。您的使用者根据间隔设置(在您的情况下为3秒)发送后台线程心跳。如果经纪人在此会话超时(在您的情况下为15秒)到期之前未收到任何心跳信号,则经纪人将从该组中删除此客户端并启动重新平衡。

我在Consumer Configs上的Kafka文档中对我提到的配置进行了更详细的描述

第三,在无法控制的Kafka故障中,我至少收到了3万条此类消息,可通过重新启动该过程来解决。为什么会这样?

这似乎是前两个问题的结合,在这里无法发送心跳,而消费者仍然试图通过连续调用的poll方法进行提交。

就像@GaryRussell在他的评论中提到的那样,我会谨慎使用auto.commit.enabled,而不是自己控制偏移管理。