我正在尝试从每个分区获取给定主题的最新偏移量(未提交的偏移量)。
from kafka import KafkaConsumer, TopicPartition
topic = 'test-topic'
broker = 'localhost:9092'
consumer = KafkaConsumer(bootstrap_servers=broker)
tp = TopicPartition(topic, 0) #1
consumer.assign([tp]) #2
consumer.seek_to_end(tp) #3
last_offset = consumer.position(tp) #4
for i in consumer.partitions_for_topic(topic):
tp = TopicPartition(topic, i)
consumer.assign([tp])
consumer.seek_to_end(tp)
last_offset = consumer.position(tp)
print(last_offset)
前面的代码确实起作用并打印每个分区的偏移量。但是,请注意我在循环外部和循环内部如何具有相同的4行。如果我删除了任何行#1-#4(for循环之前的4行),则会收到错误消息: 文件“ check_kafka_offset.py”,第19行,在 为我在Consumer.partitions_for_topic(topic)中: TypeError:“ NoneType”对象不可迭代
为什么在for循环之前需要有4行?
答案 0 :(得分:1)
您可以在该客户端中使用end_offsets(partitions)
函数来获取指定分区的最后一个偏移量。请注意,返回的偏移量是 next 偏移量,即当前结尾+1。 Documentation here.
编辑:示例实现:
from kafka import KafkaProducer, KafkaConsumer, TopicPartition
from kafka.errors import KafkaError
import json
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
BOOTSTRAP="""cluster:9092"""
API_KEY="""redacted"""
API_SECRET="""redacted"""
TOPIC="python-test"
consumer = KafkaConsumer(
group_id="my-group",
bootstrap_servers=[BOOTSTRAP],
security_protocol="SASL_SSL",
sasl_mechanism="PLAIN",
sasl_plain_username=API_KEY,
sasl_plain_password=API_SECRET,
value_deserializer=lambda m: json.loads(m.decode('ascii')),
auto_offset_reset='earliest'
)
PARTITIONS = []
for partition in consumer.partitions_for_topic(TOPIC):
PARTITIONS.append(TopicPartition(TOPIC, partition))
partitions = consumer.end_offsets(PARTITIONS)
print(partitions)
和end_offsets
看起来像这样:
{TopicPartition(topic=u'python-test', partition=0): 5,
TopicPartition(topic=u'python-test', partition=1): 20,
TopicPartition(topic=u'python-test', partition=2): 0}