Question

我有一个kafka消费者，它连接到一个有3个分区的主题。一旦我从kafka获得记录，我想捕获偏移量和分区。在重新启动时，我想从最后一次读取偏移

恢复消费者的位置

来自kafka文档：

每条记录都有自己的偏移量，因此要管理自己的偏移量，您只需执行以下操作：

配置enable.auto.commit = false

使用每个ConsumerRecord提供的偏移量来保存您的   位置。

重新启动时使用搜索恢复使用者的位置   （TopicPartition，很长）。

以下是我的示例代码：

constructor{    
    load data into offsetMap<partition,offset>
    initFlag=true;
}

Main method
{
    ConsumerRecords<String, String> records = consumer.poll(100);
    if(initFlag) // is this correct way to override offset position?
    {
        seekToPositions(offsetMap); 
        initFlag=false;
    }
    while(!shutdown)
    {
        for (ConsumerRecord<String, String> record : records) {
                System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
                getOffsetPositions();// dump offsets and partitions to db/disk
        }   
   }
}

//get current offset and write to a file
public synchronized Map<Integer, Long> getOffsetPositions() throws Exception{

    Map<Integer, Long> offsetMap = new HashMap<Integer, Long>();
    //code to put partition and offset into map
    //write to disk or db

    }
} // Overrides the fetch offsets that the consumer

public synchronized void seekToPositions(Map<Integer, Long> offsetMap) {
            //code get partitions and offset from offsetMap
            consumer.seek(partition, offset);

    }

这是正确的方法吗？有没有更好的方法？

Answer 1

如果您承诺抵消，Kafka会为您存储（默认情况下最多24小时）。

这样，如果您的消费者死亡，您可以在另一台机器上启动相同的代码，并从您上次停止的地方继续。无需外部存储空间。

请参阅https://kafka.apache.org/0102/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

中的“抵消和消费者排名”

并建议您考虑使用commitSync

Answer 2

对我来说没关系，只要注意你的消费者是如何构建的（手动分区分配或自动分配）

如果分区分配自动完成，则需要特别注意处理分区分配更改的情况。这可以通过在对subscribe（Collection，ConsumerRebalanceListener）和subscribe（Pattern，ConsumerRebalanceListener）的调用中提供ConsumerRebalanceListener实例来完成。例如，当从消费者处获取分区时，消费者将希望通过实现ConsumerRebalanceListener.onPartitionsRevoked（Collection）来为这些分区提交其偏移量。将分区分配给使用者时，使用者将希望查找这些新分区的偏移量，并通过实现ConsumerRebalanceListener.onPartitionsAssigned（Collection）将消费者正确初始化为该位置。

https://kafka.apache.org/0101/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

Answer 3

这可以通过控制我们提交的偏移量来解决。

首先要做的是在消费者应用程序中将配置“enable.auto.commit”关闭为“false”，这样您就可以控制何时提交偏移量。

我们使用 Map 手动跟踪偏移量，如下所示：

Map<TopicPartition, OffsetAndMetadata> currentOffsets = new HashMap<>();
    
    consumer.subscribe(topic, new CommitCurrentOffset());

    try {
        ConsumerRecords<String, String> records = consumer.poll(100);
        for (ConsumerRecord<String, String> record : records) {
            // process the record (ex : save in DB / call external service etc..)

            currentOffsets.put(new TopicPartition(record.topic(), record.partition()),
                               new OffsetAndMetadata(record.offset() + 1, null));  // 1
        }
            consumer.commitAsync(currentOffsets, null);  // 2
    }
    finally {
        consumer.commitSync(currentOffsets);  // 3
    }

  class CommitCurrentOffset implements ConsumerRebalanceListener {  // 4
     public void onPartitionRevoked(Collection<TopicPartition> topicPartitions) {
       consumer.commitSync(currentOffsets);
       consumer.close();
     }
  }

当我们处理每条消息时，我们添加在我们的映射中处理的消息的偏移量，如下所示：

   currentOffsets.put(new TopicPartition(record.topic(), record.partition()),
                            new OffsetAndMetadata(record.offset() + 1, null));

我们将异步处理的消息的偏移量提交给代理。
如果在处理消息时出现任何错误/异常，我们会提交为每个分区处理的最新消息的偏移量。
当我们即将由于重新平衡而丢失一个分区时，我们需要提交偏移量。在这里，我们提交了我们处理过的最新偏移量（ In for each loop），而不是我们仍在处理的批次中的最新偏移量。我们通过实现 ConsumerRebalanceListener 接口来实现这一点。每当触发重新平衡时，将在重新平衡开始之前和消费者停止处理消息之后调用 onPartitionRevoked() 方法。

Kafka从相同的偏移量重新启动

3 个答案: