我正在运行2.1.0版的Kafka Streams应用。我发现运行了一段时间后,我的应用程序(63个节点)将一个接一个地进入错误状态。最终,所有63个节点都关闭了。 例外是:
ERROR o.a.k.s.p.i.ProcessorStateManager - task [2_2] Failed to
flush state store KSTREAM-REDUCE-STATE-STORE-0000000014:
org.apache.kafka.streams.errors.StreamsException: task [2_2]
Abort sending since an error caught with a previous record
(key 110646599468 value InterimMessage [sessionStart=1567150872690,count=1]
timestamp 1567154490411) to topic item.interim due to
org.apache.kafka.common.errors.TimeoutException: Failed to update
metadata after 60000 ms.
You can increase producer parameter `retries` and `retry.backoff.ms`
to avoid this error.
我启用了DEBUG日志记录,发现当KStream仅请求内部主题而不是目标主题更新元数据时,发生异常。 (item.interim
是目标主题)
通常
[Producer clientId=client-autocreate-StreamThread-1-producer] Sending metadata
request (type=MetadataRequest,
topics=item.interim,test-KSTREAM-REDUCE-STATE-STORE-0000000014-changelog)
to node XXX:9092 (id: 7 rack: XXX)
但在例外之前,是
[Producer clientId=client-autocreate-StreamThread-1-producer] Sending metadata
request (type=MetadataRequest,
topics=test-KSTREAM-REDUCE-STATE-STORE-0000000014-changelog)
to node XXX:9092 (id: 7 rack: XXX)
我已更改的配置:
max.request.size=14000000
receive.buffer.bytes=32768
auto.offset.reset=latest
enable.auto.commit=false
default.api.timeout.ms=180000
cache.max.bytes.buffering=10485760
retries=20
retry.backoff.ms=80000
request.timeout.ms=120000
commit.interval.ms=100
num.stream.threads=1
session.timeout.ms=30000
我真的很困惑。谁能帮助我理解,为什么生产者将发送不同的元数据请求?还有解决问题的可能方法吗?非常感谢!