将偏移量手动提交给kafka主题的正确方法是什么

时间:2019-02-07 08:26:48

标签: python python-3.x apache-kafka kafka-python

我有一个使用者脚本,可以处理每条消息并手动将偏移量提交给主题。

CONSUMER = KafkaConsumer(
    KAFKA_TOPIC,
    bootstrap_servers=[KAFKA_SERVER],
    auto_offset_reset="earliest",
    max_poll_records=100,
    enable_auto_commit=False,
    group_id=CONSUMER_GROUP,
    # Use the RoundRobinPartition method
    partition_assignment_strategy=[RoundRobinPartitionAssignor],
    value_deserializer=lambda x: json.loads(x.decode('utf-8'))
)

while True:
    count += 1
    LOGGER.info("--------------Poll {0}---------".format(count))
    for msg in CONSUMER:
        # Process msg.value
        # Commit offset to topic
        tp = TopicPartition(msg.topic, msg.partition)
        offsets = {tp: OffsetAndMetadata(msg.offset, None)}
        CONSUMER.commit(offsets=offsets)

处理每封邮件所花费的时间为<1秒。

我收到此错误错误:

kafka.errors.CommitFailedError: CommitFailedError: Commit cannot be completed since the group has already
            rebalanced and assigned the partitions to another member.
            This means that the time between subsequent calls to poll()
            was longer than the configured max_poll_interval_ms, which
            typically implies that the poll loop is spending too much
            time message processing. You can address this either by
            increasing the rebalance timeout with max_poll_interval_ms,
            or by reducing the maximum size of batches returned in poll()
            with max_poll_records.


Process finished with exit code 1

期望:

a)如何解决此错误?

b)如何确保我的手动提交工作正常?

c)正确的偏移量提交方式。

我已经完成了此过程,但是Difference between session.timeout.ms and max.poll.interval.ms for Kafka 0.10.0.0 and later versions可以理解我的问题,对于调优轮询,会话或心跳时间的任何帮助都非常感谢。

Apache kafka:2.11-2.1.0 kafka-python:1.4.4

1 个答案:

答案 0 :(得分:0)

session.timeout.ms的消费者应少于Kafka经纪人上的group.max.session.timeout.ms