Question

Java，Kafka版本0.10.2.1

当我的KafkaConsumer认为它已经分配了一些分区时，它会陷入某种状态，能够轮询（）获取新消息，但由于

而无法提交

org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced ...

因此它再次成功轮询（？！）并且无法再循环提交。

在同一组中处于正确状态的单独线程中有第二个使用者，它分配了所有分区并可以提交偏移量。正在发生的事情的日志（消费者1处于破碎状态）：

2017-09-28 10:16:03,384 INFO  [         consumer-1] - Flushing size: 128, delay: 14 seconds, reason: periodical
2017-09-28 10:16:03,384 INFO  [         consumer-1] - Partition assignment [topic-15, topic-14, topic-11, topic-10, topic-13, topic-12, topic-9, topic-8]
2017-09-28 10:16:03,420 ERROR [         consumer-1] - Commit failed. Should rejoin the group on next poll()
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:702)
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:581)
    at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1090)
    at cmy.package.ResourceProcessor.flushAndCommitOffset(ResourceProcessor.java:116)
    at cmy.package.ResourceProcessor.run(ResourceProcessor.java:89)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
2017-09-28 10:16:13,998 INFO  [         consumer-1] - Flushing size: 6, delay: 10 seconds, reason: periodical
2017-09-28 10:16:13,999 INFO  [         consumer-1] - Partition assignment [topic-15, topic-14, topic-11, topic-10, topic-13, topic-12, topic-9, topic-8]
2017-09-28 10:16:13,999 INFO  [         consumer-0] - Flushing size: 164, delay: 19 seconds, reason: periodical
2017-09-28 10:16:13,999 INFO  [         consumer-0] - Partition assignment [topic-15, topic-14, topic-11, topic-10, topic-13, topic-12, topic-7, topic-6, topic-9, topic-8, topic-3, topic-2, topic-5, topic-4, topic-1, topic-0]
2017-09-28 10:16:14,003 ERROR [         consumer-1] - Commit failed. Should rejoin the group on next poll()
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:702)
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:581)
    at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1090)
    at cmy.package.ResourceProcessor.flushAndCommitOffset(ResourceProcessor.java:116)
    at cmy.package.ResourceProcessor.run(ResourceProcessor.java:89)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

启动消费者的代码：

   package my.package;

import static java.util.concurrent.TimeUnit.SECONDS;

import java.time.Duration;
import java.time.Instant;
import java.util.Optional;
import java.util.Properties;
import java.util.concurrent.ExecutionException;

import org.apache.kafka.clients.consumer.CommitFailedException;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.errors.WakeupException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.config.AutowireCapableBeanFactory;

import com.github.rholder.retry.RetryException;
import com.github.rholder.retry.Retryer;
import com.github.rholder.retry.RetryerBuilder;
import com.github.rholder.retry.StopStrategies;
import com.github.rholder.retry.WaitStrategies;
import com.google.common.base.Throwables;

public class ResourceProcessor implements Runnable {
    private static Retryer<Void> retryer = RetryerBuilder.<Void>newBuilder()
            .retryIfException()
            .withWaitStrategy(WaitStrategies.fibonacciWait(1000, 60, SECONDS))
            .withStopStrategy(StopStrategies.neverStop())
            .<Void>build();

    private final MyConfig config;
    private final int index;
    private AutowireCapableBeanFactory beanFactory;

    private volatile KafkaConsumer<byte[], MyValue> consumer;
    private volatile SinkTask sink;

    private int processedSize = 0;
    private Instant lastFlushTime = Instant.now();
    private Properties consumerConfig;
    private String name;
    private Logger logger;

    public ResourceProcessor(
            Properties consumerConfig,
            MyConfig config,
            int index,
            ResourcePrimaryStore resourceStore,
            AutowireCapableBeanFactory beanFactory) {
        this.consumerConfig = consumerConfig;
        this.config = config;
        this.index = index;
        this.resourceStore = resourceStore;
        this.beanFactory = beanFactory;
    }

    public void run() {
        name = String.format("consumer-%d", index);
        logger = LoggerFactory.getLogger(name);
        Thread.currentThread().setName(name);
        logger.info("Starting");

        sink = createSink();
        consumer = new KafkaConsumer<>(consumerConfig);

        try {
            consumer.subscribe(config.topics(),
                    new PartitionsRevokeCallback(() -> flushAndCommitOffset("partitions revoke")));
            while (true) {
                ConsumerRecords<byte[], MyValue> records = consumer.poll(10000);
                records.forEach(v -> {
                    sink.put(v.value());
                    processedSize = processedSize + 1;
                });
                if (shouldFlush()) {
                    flushAndCommitOffset("periodical");
                }
            }
        } catch (WakeupException e) {
            // Requested to shut down and do it gracefully
            flushAndCommitOffset("shutting down");
        } finally {
            consumer.close();
            sink.close();
        }
    }

    private boolean shouldFlush() {
        Duration waitUntilFlush = Duration.between(Instant.now(), lastFlushTime.plus(config.flushInterval()));
        return processedSize >= config.batchSize()
                || waitUntilFlush.isNegative() || waitUntilFlush.isZero();

    }

    private void flushAndCommitOffset(String reason) {
        logger.info("Flushing size: {}, delay: {} seconds, reason: {}",
                processedSize,
                Duration.between(lastFlushTime, Instant.now()).getSeconds(),
                reason);
        try {
            flushRetryingInfinitely();
            processedSize = 0;
            lastFlushTime = Instant.now();
            consumer.commitSync();
        } catch (CommitFailedException e) {
            logger.error("Commit failed. Should rejoin the group on next poll()", e);
        }
    }

    private void flushRetryingInfinitely() {
        try {
            retryer.call(() -> {
                logger.info("Partition assignment {}", consumer.assignment());
                sink.flush();
                return null;
            });
        } catch (ExecutionException | RetryException e) {
            // should never happen as we retry all exceptions
            throw Throwables.propagate(e);
        }
    }

    private SinkTask createSink(String sinkClass) {
        try {
            return (SinkTask) beanFactory.createBean(Class.forName("my.package.SinkTask"));
        } catch (ClassNotFoundException e) {
            throw Throwables.propagate(e);
        }
    }

    public void close() {
        consumer.wakeup();
    }

}

在我停止启动数据库之后出现了这种破坏状态，因此flushRetryingInfinitely()重试了一段时间阻止KafkaConsumer调用poll（）。

KafkaConsumer在重新平衡后从旧分区读取，不尝试在poll（）上更新状态

0 个答案: