我正在尝试使用kafka-python
库从Kafka经纪人那里消费数据,并且有多个经纪人正在以高频率生成数据,但是在Kafka消费者方面,我需要大约5秒的处理时间,因此在处理之后第一条消息,我应该得到最新消息,而不是最后一次提交偏移之后的下一条消息。
我尝试设置enable_auto_commit=False
,并且auto_offset_reset="latest"
也尝试设置随机组ID,也尝试设置group_id = None
。这样做的唯一作用是使我仅在开始时才获得最新信息,但之后每个数据都按偏移量顺序出现,而不是队列末尾或最新数据。
consumer = KafkaConsumer(bootstrap_servers=kafka_brokers_address,
api_version=(2, 3, 0),
group_id='abcd',
value_deserializer=lambda v:json.loads(v.decode('utf-8')),
enable_auto_commit=False,
auto_offset_reset="latest")
consumer_rpnl.assign([TopicPartition('topic', 0)])
c = next(consumer)
## also tried
for c in consumer:
print(c.values)
答案 0 :(得分:1)
示例如何从以下位置移至最后:https://github.com/dpkp/kafka-python/issues/1405
def seek_to_last():
consumer = KafkaConsumer(bootstrap_servers=config.kafka_bootstrap_server,
group_id=config.kafka_check_proxy_thread_group,
value_deserializer=lambda m: json.loads(m.decode('utf-8')), auto_offset_reset='latest',
enable_auto_commit=True)
partitions = consumer.partitions_for_topic(config.kafka_raw_proxy_topic)
if len(partitions) > config.TASK_OK_PROXY_SCAN_THREAD_N:
logger.error("...................")
for partation in partitions:
p = TopicPartition(config.kafka_raw_proxy_topic, partation)
mypartition = [p]
consumer.assign(mypartition)
# consumer.seek_to_end(p)
last_pos = consumer.end_offsets(mypartition)
pos = last_pos[p]
logger.info("%s, %s" % (partation, pos))
# consumer.seek(p, pos)
offset = OffsetAndMetadata(pos, b'')
consumer.commit(offsets={p: offset})