Question

kafka使用者无法仅针对特定分区提交偏移量。

aklsfoipafasldmaknfa    asiofuasofiusaofasd
[2019-01-04 12:22:22,691] ERROR [Consumer clientId=consumer-1, groupId=console-consumer-11955] Offset commit failed on partition my-topic-2-9 at offset 0: The request timed out. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2019-01-04 12:22:28,617] ERROR [Consumer clientId=consumer-1, groupId=console-consumer-11955] Offset commit failed on partition my-topic-2-9 at offset 1: The request timed out. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
as;lkasl;dkas;faskfasfasfasodaspd   qdoiwudqouoaisdiaduasodiuasd
[2019-01-04 12:23:18,875] ERROR [Consumer clientId=consumer-1, groupId=console-consumer-11955] Offset commit failed on partition my-topic-2-9 at offset 1: The request timed out. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

任何人都可以向我解释此错误，可能是什么原因引起的？

我们的集群中有5个在AWS中运行的代理。我们使用Apache Kafka 2.1。

我正在运行一个非常简单的Kafka控制台生产者，并使用Kafka控制台使用者使用相同的消息。

控制台用户使用该消息后，我看到此错误。

// PRODUCER
./bin/kafka-console-producer.sh   --broker-list kafka1:9092   --topic my-topic-2 --property "parse.key=true"   --property "key.separator=,"

 //CONSUMER
./bin/kafka-console-consumer.sh --bootstrap-server kafka1:9092 --from-beginning --topic my-topic-2 --property="print.key=true"

请注意，我们的集群有200多个主题，涉及许多生产者和消费者。

只是我无法理解这种行为。

他是grafana的屏幕截图。

编辑：

请随时询问更多详细信息。这个错误确实令人沮丧。

编辑2：

./bin/kafka-topics.sh --describe --zookeeper zookeeper1:2181/kafka --topic my-topic-2
Topic:my-topic-2    PartitionCount:10   ReplicationFactor:3 Configs:
Topic: my-topic-2   Partition: 0    Leader: 4   Replicas: 4,2,3 Isr: 4,2,3
Topic: my-topic-2   Partition: 1    Leader: 5   Replicas: 5,3,4 Isr: 5,4,3
Topic: my-topic-2   Partition: 2    Leader: 1   Replicas: 1,4,5 Isr: 1,4,5
Topic: my-topic-2   Partition: 3    Leader: 2   Replicas: 2,5,1 Isr: 2,1,5
Topic: my-topic-2   Partition: 4    Leader: 3   Replicas: 3,1,2 Isr: 3,2,1
Topic: my-topic-2   Partition: 5    Leader: 4   Replicas: 4,3,5 Isr: 4,3,5
Topic: my-topic-2   Partition: 6    Leader: 5   Replicas: 5,4,1 Isr: 5,4,1
Topic: my-topic-2   Partition: 7    Leader: 1   Replicas: 1,5,2 Isr: 1,2,5
Topic: my-topic-2   Partition: 8    Leader: 2   Replicas: 2,1,3 Isr: 2,3,1
Topic: my-topic-2   Partition: 9    Leader: 3   Replicas: 3,2,4 Isr: 3,2,4

编辑3：

我对了解此问题的可能原因更感兴趣，这可能有助于我们找出集群的其他问题。

编辑4：

所有经纪人，消费者和生产者都在同一地区的同一VPC中。
我知道偏移提交超时可以增加，但是为什么呢？是什么导致这种延迟？ 5000 ms本身对于应该是实时的系统来说实在太多了。
卡夫卡经纪人可能超负荷或网络拥塞，但是为什么呢？如您所见，数据输入速率最大为2-3 mbps，对于5台机器（r5.xlarge）的kafka集群来说，这是否过多？告诉我这种情况，我对kafka还是陌生的。
在这样的设置中，什么会成为瓶颈？

Answer 1

使用者线程与主题分区之间的比率是多少？

我在集群中发现，当少量使用者线程消耗大量分区（例如，将1个线程分配给30个主题分区）时，更容易发生此错误。

对我而言，使该错误消失的最佳配置是1：1（每个主题分区1个使用者线程），但是当我想向组中添加更多使用者线程时，我遇到了缩放问题。

我通过开发一种执行1：1配给量的使用者部署机制来解决这个问题，例如，当部署3个使用者来消耗30个分区时，每个人将打开10个线程，而要进行扩展，例如部署10个使用者，每个人将打开3个线程...

我不知道我是否在遵循最佳实践，但现在可以完成工作

卡夫卡为何无法为特定分区提交偏移量？

1 个答案: