Question

我正在使用融合的Kafka-rest产品来消费某个主题的记录。我的意图是只消耗主题中的前100条记录。我正在使用以下REST API来获取记录

GET /consumers/testgroup/instances/my_consumer/records

如何实现？有想法吗？

Answer 1

As far as I'm aware this is not currently possible. As mentioned in the other answer, you can specify a max size in bytes (although this can actually be ignored by the brokers in some cases) but you cannot specify the desired number of messages.

However, such a feature can be easily implemented in your client code. You could guess a rough size, query the REST API and see how many messages you've received. If it's less than 100, then query it again to get the next few messages until you reached 100.

Answer 2

如果您尝试使用来自消费者组的100条消息的新批次，则应将max_bytes设置为一个值，该值对于您的数据模型将始终返回大约100条记录。您可以采用比较保守的逻辑（先减少后再增加一些，直到截止到100），或者可以始终得到更多然后忽略。在这两种方式中，您都应该对用户组采用手动偏移管理。

GET /consumers/testgroup/instances/my_consumer/records?max_bytes=300000

如果收到100条以上的消息，并且由于某种原因而忽略它们，那么如果启用了偏移自动提交（在创建使用者时定义），则不会在该使用者组上再次收到它们。您可能不希望这种情况发生！

如果您手动提交偏移量，那么如果您提交正确的偏移量以确保不丢失任何消息，则可以忽略所需的内容。您可以像这样手动提交偏移量：

POST /consumers/testgroup/instances/my_consumer/offsets HTTP/1.1
Host: proxy-instance.kafkaproxy.example.com
Content-Type: application/vnd.kafka.v2+json

{
  "offsets": [
    {
      "topic": "test",
      "partition": 0,
      "offset": <calculated offset ending where you stopped consuming for this partition>
    },
    {
      "topic": "test",
      "partition": 1,
      "offset": <calculated offset ending where you stopped consuming for this partition>
    }
  ]
}

如果要精确获取主题的前100条记录，则需要在使用一次agaion之前重置该主题和每个分区的使用者组偏移量。您可以这样（taken from confluent）：

POST /consumers/testgroup/instances/my_consumer/offsets HTTP/1.1
Host: proxy-instance.kafkaproxy.example.com
Content-Type: application/vnd.kafka.v2+json

{
  "offsets": [
    {
      "topic": "test",
      "partition": 0,
      "offset": 0
    },
    {
      "topic": "test",
      "partition": 1,
      "offset": 0
    }
  ]
}

Answer 3

可以使用属性ConsumerConfig.MAX_POLL_RECORDS_CONFIG来配置KafkaConsumer。请参阅doc

如何限制卡夫卡消费者中的记录数

3 个答案: