Question

我正在尝试从每个分区获取给定主题的最新偏移量（未提交的偏移量）。

from kafka import KafkaConsumer, TopicPartition

topic = 'test-topic'
broker = 'localhost:9092'

consumer = KafkaConsumer(bootstrap_servers=broker)

tp = TopicPartition(topic, 0)        #1
consumer.assign([tp])                #2
consumer.seek_to_end(tp)             #3
last_offset = consumer.position(tp)  #4

for i in consumer.partitions_for_topic(topic):
    tp = TopicPartition(topic, i)
    consumer.assign([tp])
    consumer.seek_to_end(tp)
    last_offset = consumer.position(tp)
    print(last_offset)

前面的代码确实起作用并打印每个分区的偏移量。但是，请注意我在循环外部和循环内部如何具有相同的4行。如果我删除了任何行＃1-＃4（for循环之前的4行），则会收到错误消息： 文件“ check_kafka_offset.py”，第19行，在为我在Consumer.partitions_for_topic（topic）中： TypeError：“ NoneType”对象不可迭代

为什么在for循环之前需要有4行？

Answer 1

您可以在该客户端中使用end_offsets(partitions)函数来获取指定分区的最后一个偏移量。请注意，返回的偏移量是 next 偏移量，即当前结尾+1。 Documentation here.

编辑：示例实现：

from kafka import KafkaProducer, KafkaConsumer, TopicPartition
from kafka.errors import KafkaError
import json
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

BOOTSTRAP="""cluster:9092"""
API_KEY="""redacted"""
API_SECRET="""redacted"""
TOPIC="python-test"

consumer = KafkaConsumer(
    group_id="my-group",
    bootstrap_servers=[BOOTSTRAP],
    security_protocol="SASL_SSL",
    sasl_mechanism="PLAIN",
    sasl_plain_username=API_KEY,
    sasl_plain_password=API_SECRET,
    value_deserializer=lambda m: json.loads(m.decode('ascii')),
    auto_offset_reset='earliest'
)

PARTITIONS = []
for partition in consumer.partitions_for_topic(TOPIC):
    PARTITIONS.append(TopicPartition(TOPIC, partition))

partitions = consumer.end_offsets(PARTITIONS)
print(partitions)

和end_offsets看起来像这样：

{TopicPartition(topic=u'python-test', partition=0): 5,
 TopicPartition(topic=u'python-test', partition=1): 20,
 TopicPartition(topic=u'python-test', partition=2): 0}

如何使用kafka-python从每个分区获取最新偏移量？

1 个答案: