Kafka 0.9-Java:消费者在应用程序重启期间跳过补偿

时间:2016-12-16 05:09:14

标签: java apache-kafka kafka-consumer-api

我有一个具有以下属性的java应用程序

kafkaProperties = new Properties();
kafkaProperties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBrokersList);
kafkaProperties.put(ConsumerConfig.GROUP_ID_CONFIG, consumerGroupName);
kafkaProperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
kafkaProperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
kafkaProperties.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, consumerSessionTimeoutMs);
kafkaProperties.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, maxPartitionFetchBytes);
kafkaProperties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);

我创建了15个消费者线程,让他们处理下面的可运行线程。我没有任何其他消费者使用此消费者群体名称消费。

@Override
public void run() {
    try {
        logger.info("Starting ConsumerWorker, consumerId={}", consumerId);
        consumer.subscribe(Arrays.asList(kafkaTopic), offsetLoggingCallback);
        while (true) {
            boolean isPollFirstRecord = true;
            logger.debug("consumerId={}; about to call consumer.poll() ...", consumerId);
            ConsumerRecords<String, String> records = consumer.poll(pollIntervalMs);
            Map<Integer,Long> partitionOffsetMap = new HashMap<>();
            for (ConsumerRecord<String, String> record : records) {

                if (isPollFirstRecord) {
                    isPollFirstRecord = false;
                    logger.info("Start offset for partition {} in this poll : {}", record.partition(), record.offset());
                }
                messageProcessor.processMessage(record.value(), record.offset());
                partitionOffsetMap.put(record.partition(),record.offset());
            }
            if (!records.isEmpty()) {
                logger.info("Invoking commit for partition/offset : {}", partitionOffsetMap);
                consumer.commitAsync(offsetLoggingCallback);
            }
        }
    } catch (WakeupException e) {
        logger.warn("ConsumerWorker [consumerId={}] got WakeupException - exiting ... Exception: {}",
                consumerId, e.getMessage());
    } catch (Exception e) {
        logger.error("ConsumerWorker [consumerId={}] got Exception - exiting ... Exception: {}",
                consumerId, e.getMessage());
    } finally {
        logger.warn("ConsumerWorker [consumerId={}] is shutting down ...", consumerId);
        consumer.close();
    }
}

我还有一个OffsetCommitCallbackImpl,如下所示。它基本上将分区及其提交的偏移量保持为map。它还会在提交偏移量时记录。

@Override
    public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets, Exception exception) {
        if (exception == null) {
            offsets.forEach((topicPartition, offsetAndMetadata) -> {
                partitionOffsetMap.put(topicPartition, offsetAndMetadata);
                logger.info("Offset position during the commit for consumerId : {}, partition : {}, offset : {}", Thread.currentThread().getName(), topicPartition.partition(), offsetAndMetadata.offset());
            });
        } else {
            offsets.forEach((topicPartition, offsetAndMetadata) -> logger.error("Offset commit error, and partition offset info : {}, partition : {}, offset : {}", exception.getMessage(), topicPartition.partition(), offsetAndMetadata.offset()));
        }
    }  

问题/问题:    我注意到,每当我(重新启动)将应用程序关闭并重新启动时,我都会错过事件/消息。所以当我密切关注伐木时。通过比较关闭之前提交的偏移量(使用offsetcommitcallback logging )与重新启动后为处理选择的偏移量,我看到对于某个分区,我们没有拾取我们在关闭之前离开的偏移量。有时,某些分区的起始偏移量比承诺偏移量多1000个。

注意:这恰好类似于40个分区中的8个

如果仔细查看运行方法中的日志记录,则会有一个日志语句,其中我实际上在调用异步提交之前打印了偏移量。例如,如果关机前的最后一个日志显示分区1为10。重新启动第一个偏移后,我们对分区1的处理将是100。我确认我们完全错过了90条消息。

任何人都可以想到为什么会发生这种情况的原因吗?

0 个答案:

没有答案