有没有办法获取kafka中某个主题中的最新1000条记录/消息?类似于tail -f 1000
,如果是linux中的文件?
答案 0 :(得分:1)
我认为使用Python Kafka !!!我找到了这种方式来获取最新消息。
配置它以获取n条最新消息,但请确保有足够的消息以防主题为空。这看起来像是流式传输的工作,即Kafka流或Kafka SQL
#!/usr/bin/env python
from kafka import KafkaConsumer, TopicPartition
TOPIC = 'example_topic'
GROUP = 'demo'
BOOTSTRAP_SERVERS = ['bootstrap.kafka:9092']
consumer = KafkaConsumer(
bootstrap_servers=BOOTSTRAP_SERVERS,
group_id=GROUP,
# enable_auto_commit=False,
auto_commit_interval_ms=0,
max_poll_records=1
)
candidates = []
consumer.commit()
msg = None
partitions = consumer.partitions_for_topic(TOPIC)
for p in partitions:
tp = TopicPartition(TOPIC, p)
consumer.assign([tp])
committed = consumer.committed(tp)
consumer.seek_to_end(tp)
last_offset = consumer.position(tp)
print(f"\ntopic: {TOPIC} partition: {p} committed: {committed} last: {last_offset} lag: {(last_offset - committed)}")
consumer.poll(
timeout_ms=100,
# max_records=1
)
# consumer.assign([partition])
consumer.seek(tp, last_offset-4)
for message in consumer:
# print(f"Message is of type: {type(message)}")
print(message)
# print(f'message.offset: {message.offset}')
# TODO find out why the number is -1
if message.offset == last_offset-1:
candidates.append(message)
# print(f' {message}')
# comment if you don't want the messages committed
consumer.commit()
break
print('\n\ngooch\n\n')
latest_msg = candidates[0]
for msg in candidates:
print(f'finalists:\n {msg}')
if msg.timestamp > latest_msg.timestamp:
latest_msg = msg
consumer.close()
print(f'\n\nlatest_message:\n{latest_msg}')
我知道在Java / Scala Kafka Streams中可以创建一个表,即一个子主题,只有另一个主题的最后一个条目,因此c中的Kafka库融合可能会提供一种更优雅,更有效的方法。它具有python和java绑定以及kafkacat CLI。
答案 1 :(得分:0)
您可以使用seek
类的KafkaConsumer
方法 - 您需要找到每个分区的当前偏移量,然后执行计算以找到正确的偏移量。
答案 2 :(得分:0)
consumer = KafkaConsumer()
partition = TopicPartition('foo', 0)
start = 1234
end = 2345
consumer.assign([partition])
consumer.seek(partition, start)
for msg in consumer:
if msg.offset > end:
break
else:
print msg