具有状态存储的Kafka流-在应用重启时对消息进行重新处理

时间:2019-03-20 14:45:45

标签: apache-kafka apache-kafka-streams

我们具有两个带有以下转换器的拓扑,每个转换器都使用持久状态存储:

kStreamBuilder.stream(inboundTopicName)
    .transform(() -> new FirstTransformer(FIRST_STATE_STORE), FIRST_STATE_STORE)
    .map((key, value) -> ...)
    .transform(() -> new SecondTransformer(SECOND_STATE_STORE), SECOND_STATE_STORE)
    .to(outboundTopicName);

,Kafka设置具有auto.offset.reset: latest。应用启动后,我看到创建了两个内部压缩的主题(并且可以预期):appId_inbound_firstStateStore-changelogappId_inbound_secondStateStore-changelog

我们的应用程序关闭了两天,再次启动应用程序后,从头开始针对特定分区重新处理了消息(但是我们有多个分区)。 我知道版本2之前的kafka经纪人会在约1天内存储已提交的偏移量,因此应通过保留来清除我们的偏移量。 但是,如果我们使用auto.offset.reset: latest,为什么要从头开始重新处理消息?也许某种程度上与状态操作或变更日志内部主题有关。

我看到以下日志(大多数日志重复多次):

StoreChangelogReader Restoring task 0_55's state store firstStateStore from beginning of the changelog
Fetcher [Consumer clientId=xxx-restore-consumer, groupId=] Resetting offset for partition xxx-55 to offset 0
ConsumerCoordinator Setting newly assigned partitions
ConsumerCoordinator Revoking previously assigned partitions
StreamsPartitionAssignor Assigned tasks to clients
AbstractCoordinator Successfully joined group with generation
StreamThread partition revocation took xxx ms
Unsubscribed all topics or patterns and assigned partitions
AbstractCoordinator (Re-)joining group
Attempt to heartbeat failed since group is rebalancing
AbstractCoordinator Group coordinator xxx:9092 (id: xxx rack: null) is unavailable or invalid, will attempt rediscovery
FetchSessionHandler - [Consumer clientId=xxx-restore-consumer, groupId=] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 2: org.apache.kafka.common.errors.DisconnectException

Kafka经纪人版本0.11.0.2; Kafka Streams版本2.1.0

0 个答案:

没有答案