Question

我有一个包含40个分区的主题。设置如下：

def on_assign (c,ps):
    for p in ps:
        p.offset=0
    print ps
    c.assign(ps)

conf = {'bootstrap.servers': 'localhost:9092'
        'enable.auto.commit' : False,
        'group.id' : 'confluent_consumer',
        'default.topic.config': {'auto.offset.reset': 'earliest'}
        }
consumer = Consumer(**conf)
consumer.subscribe(['topic.source'], on_assign=on_assign)

msg = consumer.poll(timeout=100000)
print "Topic is %s: | Partition is %d: | Offset is : %d | key is :%s " % (msg.topic(), msg.partition(), msg.offset(), msg.key())

我想从偏移0读取主题topic.source的所有分区。但是我并没有看到所有分区都发生这种情况。对于某些分区，它从一个特定的偏移量中读取，我假设它是提交的偏移量，每次改变group.id都没有帮助。无论承诺的偏移量如何，我如何从头开始读取该主题的所有分区？

我在ps中打印了on_assign()，并为所有40个分区打印了类似的内容：

[TopicPartition{topic=topic.source,partition=0,offset=0,error=None},TopicPartition{topic=topic.source,partition=1,offset=0,error=None}....] and so on

Answer 1

如果您将set group.id用于新值或使用未设置auto.offset.reset设置为earliest的任何偏移的组，那么使用者将从分区的开头开始

也就是说，开头可能没有偏移0.根据你的经纪人的日志保留设置，Kafka可以删除消息，因此分区中的第一个可用消息可以是任何偏移量。

汇编Kafka：消费者不会从一开始就读取主题中的所有分区

1 个答案: