如何在 Python3.6 中将 kafka 主题的所有分区分配给消费者?

时间:2021-02-03 20:15:31

标签: apache-kafka python-3.6

我正在运行 Python3.6 和 python-kafka 1.3.5

我有一个分区为 0-15 的 kafka 主题,它们当前都包含消息。我希望我的消费者从所有分区读取和提取消息。这就是我目前正在尝试这样做的方式

import kafka
consumer = kafka.KafkaConsumer(bootstrap_servers='broker1:9092,broker2:9092,broker3:9092', sasl_mechanism='PLAIN', enable_auto_commit=False, group_id="my_test_group", auto_offset_reset='latest',max_poll_records=2000, heartbeat_interval_ms=2000, consumer_timeout_ms=1000)
kafka_partition0 = kafka.TopicPartition('my_topic', 0)
kafka_partition1 = kafka.TopicPartition('my_topic', 1)
kafka_partition2 = kafka.TopicPartition('my_topic', 2)
kafka_partition3 = kafka.TopicPartition('my_topic', 3)
kafka_partition4 = kafka.TopicPartition('my_topic', 4)
kafka_partition5 = kafka.TopicPartition('my_topic', 5)
kafka_partition6 = kafka.TopicPartition('my_topic', 6)
kafka_partition7 = kafka.TopicPartition('my_topic', 7)
kafka_partition8 = kafka.TopicPartition('my_topic', 8)
kafka_partition9 = kafka.TopicPartition('my_topic', 9)
kafka_partition10 = kafka.TopicPartition('my_topic', 10)
kafka_partition11 = kafka.TopicPartition('my_topic', 11)
kafka_partition12 = kafka.TopicPartition('my_topic', 12)
kafka_partition13 = kafka.TopicPartition('my_topic', 13)
kafka_partition14 = kafka.TopicPartition('my_topic', 14)
kafka_partition15 = kafka.TopicPartition('my_topic', 15)
consumer.assign([kafka_partition0, kafka_partition1, kafka_partition2, kafka_partition3, kafka_partition4, kafka_partition5, kafka_partition6, kafka_partition7, kafka_partition8, kafka_partition9, kafka_partition10, kafka_partition11, kafka_partition12, kafka_partition13, kafka_partition14, kafka_partition15])
messages = consumer.poll()

但是,当我查看 messages 变量中的键时,我只能看到来自分区 7、11 和 15 的消息

为什么会这样?

1 个答案:

答案 0 :(得分:1)

创建列表似乎没有必要(但列表理解会使代码更短)

话虽如此,文档中的基本示例已经消耗了所有分区

from kafka import KafkaConsumer
consumer = KafkaConsumer('my_favorite_topic', bootstrap_servers='localhost:9092', group_id='my_favorite_group'))
for msg in consumer:
    print(msg)
<块引用>

当我查看消息变量中的键时,我只能看到来自分区 7、11 和 15 的消息

你多久看一次?它应该在消耗时在分区之间循环。

数据是否真的生成在其他 patitions 中?如果没有,那么就没有必要从他们那里轮询数据