最新记录/消息出现在主题kafka中

时间:2017-11-07 07:24:43

标签: apache-kafka

有没有办法获取kafka中某个主题中的最新1000条记录/消息?类似于tail -f 1000,如果是linux中的文件?

3 个答案:

答案 0 :(得分:1)

我认为使用Python Kafka !!!我找到了这种方式来获取最新消息。

配置它以获取n条最新消息,但请确保有足够的消息以防主题为空。这看起来像是流式传输的工作,即Kafka流或Kafka SQL

#!/usr/bin/env python
from kafka import KafkaConsumer, TopicPartition

TOPIC = 'example_topic'
GROUP = 'demo'
BOOTSTRAP_SERVERS = ['bootstrap.kafka:9092']

consumer = KafkaConsumer(
    bootstrap_servers=BOOTSTRAP_SERVERS,
    group_id=GROUP,
    # enable_auto_commit=False,
    auto_commit_interval_ms=0,
    max_poll_records=1
)

candidates = []
consumer.commit()

msg = None
partitions = consumer.partitions_for_topic(TOPIC)

for p in partitions:
    tp = TopicPartition(TOPIC, p)
    consumer.assign([tp])
    committed = consumer.committed(tp)
    consumer.seek_to_end(tp)
    last_offset = consumer.position(tp)
    print(f"\ntopic: {TOPIC} partition: {p} committed: {committed} last: {last_offset} lag: {(last_offset - committed)}")

    consumer.poll(
        timeout_ms=100,
        # max_records=1
    )

    # consumer.assign([partition])
    consumer.seek(tp, last_offset-4)

    for message in consumer:
        # print(f"Message is of type: {type(message)}")
        print(message)
        # print(f'message.offset: {message.offset}')

        # TODO find out why the number is -1
        if message.offset == last_offset-1:
            candidates.append(message)
            # print(f'  {message}')

            # comment if you don't want the messages committed
            consumer.commit()
            break

print('\n\ngooch\n\n')

latest_msg = candidates[0]

for msg in candidates:
    print(f'finalists:\n {msg}')
    if msg.timestamp > latest_msg.timestamp:
        latest_msg = msg

consumer.close()


print(f'\n\nlatest_message:\n{latest_msg}')

我知道在Java / Scala Kafka Streams中可以创建一个表,即一个子主题,只有另一个主题的最后一个条目,因此c中的Kafka库融合可能会提供一种更优雅,更有效的方法。它具有python和java绑定以及kafkacat CLI。

答案 1 :(得分:0)

您可以使用seek类的KafkaConsumer方法 - 您需要找到每个分区的当前偏移量,然后执行计算以找到正确的偏移量。

答案 2 :(得分:0)

consumer = KafkaConsumer()
partition = TopicPartition('foo', 0)
start = 1234
end = 2345
consumer.assign([partition])
consumer.seek(partition, start)
for msg in consumer:
    if msg.offset > end:
        break
    else:
        print msg

source