IllegalStateException之后Kafka流关闭:没有当前分区分配

时间:2019-12-04 12:27:55

标签: apache-kafka apache-kafka-streams

我有一个启动并成功运行的Kafka Streams应用程序。我们有4个正在运行的应用程序实例。有时我们的应用程序实例之一被合法杀死,这会导致几轮重新平衡,直到替换旧节点为止。

有时,在重新平衡期间,一个或多个以前运行状况良好的节点发生故障。日志表明在收到以下异常后,Streams应用程序立即转换为PENDING_SHUTDOWN状态:

java.lang.IllegalStateException: No current assignment for partition public.chat.message-28
    at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:256)
    at org.apache.kafka.clients.consumer.internals.SubscriptionState.resetFailed(SubscriptionState.java:418)
    at org.apache.kafka.clients.consumer.internals.Fetcher$2.onFailure(Fetcher.java:621)
    at org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
    at org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
    at org.apache.kafka.clients.consumer.internals.RequestFutureAdapter.onFailure(RequestFutureAdapter.java:30)
    at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onFailure(RequestFuture.java:209)
    at org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
    at org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:571)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:389)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:297)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
    at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
    at org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:292)
    at org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:275)
    at org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1849)
    at org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1827)
    at org.apache.kafka.streams.processor.internals.StoreChangelogReader.refreshChangelogInfo(StoreChangelogReader.java:259)
    at org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:133)
    at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:79)
    at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:328)
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:866)
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:804)
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:773)

在此错误之前,我们似乎经常还会收到一些信息日志,报告断开连接异常:

 Error sending fetch request (sessionId=568252460, epoch=7) to node 4: org.apache.kafka.common.errors.DisconnectException

我觉得两者之间是有联系的,但目前我还无法解释为什么。

有人可以给我一些提示,指出什么可能导致此问题以及任何可能的解决方案吗?

其他信息:

  • 卡夫卡2.2.1
  • 32个分区均匀分布在4个工作节点上
  • StreamsConfig设置:
kafkaStreamProps.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 2);
kafkaStreamProps.put(StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, 1);
kafkaStreamProps.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 4);
kafkaStreamProps.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 120000);
kafkaStreamProps.put(StreamsConfig.TOPOLOGY_OPTIMIZATION, StreamsConfig.OPTIMIZE);

1 个答案:

答案 0 :(得分:1)

这似乎与https://issues.apache.org/jira/browse/KAFKA-9073有关,该问题已在Kafka Streams 2.3.2中修复。

如果您迫不及待想要发布该版本,则可以尝试使用以下请求请求中的更改集来创建私有版本:https://github.com/apache/kafka/pull/7630/files