我注意到我的Kafka Streams应用程序在一段时间内无法与Kafka通信后将进入ERROR
状态。我想找到一种使Kafka Streams本质上“永远重试”而不是进入ERROR
状态的方法。唯一的解决方法是重新启动我的Kafka Streams应用程序,这不理想。
我在Kafka Streams配置中设置了request.timeout.ms=2147483647
。我注意到这有帮助(它过去大约一分钟进入ERROR
状态,现在它的发生频率降低了,但最终还是会发生)。
这是我的Kafka Streams配置:
commit.interval.ms: 10000
cache.max.bytes.buffering: 0
retries: 2147483647
request.timeout.ms: 2147483647
retry.backoff.ms: 5000
num.stream.threads: 1
state.dir: /tmp/kafka-streams
producer.batch.size: 102400
producer.max.request.size: 31457280
producer.buffer.memory: 314572800
producer.max.in.flight.requests.per.connection: 10
producer.linger.ms: 0
consumer.max.partition.fetch.bytes: 31457280
consumer.receive.buffer.bytes: 655360
这是来自Kafka Streams的日志的相关部分:
[2019-06-07T22:18:07,223Z {StreamThread-1} WARN org.apache.kafka.clients.NetworkClient] [Consumer clientId=StreamThread-1-consumer, groupId=app-stream] 20 partitions have leader brokers without a matching listener, including [app-stream-tmp-store-changelog-5, app-stream-tmp-store-changelog-13, app-stream-tmp-store-changelog-9, app-stream-tmp-store-changelog-1, __consumer_offsets-10, __consumer_offsets-30, __consumer_offsets-18, __consumer_offsets-22, __consumer_offsets-34, __consumer_offsets-6]
[2019-06-07T22:18:08,662Z {StreamThread-1} ERROR org.apache.kafka.streams.processor.internals.AssignedStreamsTasks] stream-thread [StreamThread-1] Failed to commit stream task 0_14 due to the following error:
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {global-14=OffsetAndMetadata{offset=33038702, leaderEpoch=null, metadata=''}}
[2019-06-07T22:18:08,662Z {StreamThread-1} ERROR org.apache.kafka.streams.processor.internals.StreamThread] stream-thread [StreamThread-1] Encountered the following unexpected Kafka exception during processing, this usually indicate Streams internal errors:
org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before successfully committing offsets {global-2=OffsetAndMetadata{offset=25537237, leaderEpoch=null, metadata=''}}
[2019-06-07T22:18:08,662Z {StreamThread-1} INFO org.apache.kafka.streams.processor.internals.StreamThread] stream-thread [StreamThread-1] State transition from RUNNING to PENDING_SHUTDOWN
[2019-06-07T22:18:08,662Z {StreamThread-1} INFO org.apache.kafka.streams.processor.internals.StreamThread] stream-thread [StreamThread-1] Shutting down
[2019-06-07T22:18:08,704Z {StreamThread-1} INFO org.apache.kafka.clients.consumer.KafkaConsumer] [Consumer clientId=StreamThread-1-restore-consumer, groupId=null] Unsubscribed all topics or patterns and assigned partitions
[2019-06-07T22:18:08,704Z {StreamThread-1} INFO org.apache.kafka.clients.producer.KafkaProducer] [Producer clientId=StreamThread-1-producer] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
[2019-06-07T22:18:08,728Z {StreamThread-1} INFO org.apache.kafka.streams.processor.internals.StreamThread] stream-thread [StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
[2019-06-07T22:18:08,728Z {StreamThread-1} INFO org.apache.kafka.streams.KafkaStreams] stream-client [usxapgutpd01-] State transition from RUNNING to ERROR
[2019-06-07T22:18:08,728Z {StreamThread-1} ERROR org.apache.kafka.streams.KafkaStreams] stream-client [usxapgutpd01-] All stream threads have died. The instance will be in error state and should be closed.
[2019-06-07T22:18:08,728Z {StreamThread-1} INFO org.apache.kafka.streams.processor.internals.StreamThread] stream-thread [StreamThread-1] Shutdown complete