如何使用kafka-python从每个分区获取最新偏移量?

时间:2019-04-24 14:01:05

标签: python apache-kafka

我正在尝试从每个分区获取给定主题的最新偏移量(未提交的偏移量)。

from kafka import KafkaConsumer, TopicPartition

topic = 'test-topic'
broker = 'localhost:9092'

consumer = KafkaConsumer(bootstrap_servers=broker)

tp = TopicPartition(topic, 0)        #1
consumer.assign([tp])                #2
consumer.seek_to_end(tp)             #3
last_offset = consumer.position(tp)  #4

for i in consumer.partitions_for_topic(topic):
    tp = TopicPartition(topic, i)
    consumer.assign([tp])
    consumer.seek_to_end(tp)
    last_offset = consumer.position(tp)
    print(last_offset)

前面的代码确实起作用并打印每个分区的偏移量。但是,请注意我在循环外部和循环内部如何具有相同的4行。如果我删除了任何行#1-#4(for循环之前的4行),则会收到错误消息: 文件“ check_kafka_offset.py”,第19行,在     为我在Consumer.partitions_for_topic(topic)中: TypeError:“ NoneType”对象不可迭代

为什么在for循环之前需要有4行?

1 个答案:

答案 0 :(得分:1)

您可以在该客户端中使用end_offsets(partitions)函数来获取指定分区的最后一个偏移量。请注意,返回的偏移量是 next 偏移量,即当前结尾+1。 Documentation here.

编辑:示例实现:

from kafka import KafkaProducer, KafkaConsumer, TopicPartition
from kafka.errors import KafkaError
import json
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

BOOTSTRAP="""cluster:9092"""
API_KEY="""redacted"""
API_SECRET="""redacted"""
TOPIC="python-test"

consumer = KafkaConsumer(
    group_id="my-group",
    bootstrap_servers=[BOOTSTRAP],
    security_protocol="SASL_SSL",
    sasl_mechanism="PLAIN",
    sasl_plain_username=API_KEY,
    sasl_plain_password=API_SECRET,
    value_deserializer=lambda m: json.loads(m.decode('ascii')),
    auto_offset_reset='earliest'
)

PARTITIONS = []
for partition in consumer.partitions_for_topic(TOPIC):
    PARTITIONS.append(TopicPartition(TOPIC, partition))

partitions = consumer.end_offsets(PARTITIONS)
print(partitions)

end_offsets看起来像这样:

{TopicPartition(topic=u'python-test', partition=0): 5,
 TopicPartition(topic=u'python-test', partition=1): 20,
 TopicPartition(topic=u'python-test', partition=2): 0}