启用一次时,Kafka流中的UnknownProducerIdException

时间:2018-04-17 08:04:52

标签: apache-kafka apache-kafka-streams

在Kafka流应用程序上启用一次处理后,日志中会出现以下错误:

ERROR o.a.k.s.p.internals.StreamTask - task [0_0] Failed to close producer 
due to the following error:

org.apache.kafka.streams.errors.StreamsException: task [0_0] Abort 
sending since an error caught with a previous record (key 222222 value 
some-value timestamp 1519200902670) to topic exactly-once-test-topic- 
v2 due to This exception is raised by the broker if it could not 
locate the producer metadata associated with the producerId in 
question. This could happen if, for instance, the producer's records 
were deleted because their retention time had elapsed. Once the last 
records of the producerId are removed, the producer's metadata is 
removed from the broker, and future appends by the producer will 
return this exception.
  at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:125)
  at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:48)
  at org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:180)
  at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1199)
  at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204)
  at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187)
  at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:627)
  at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:596)
  at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:557)
  at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:481)
  at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74)
  at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:692)
  at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101)
  at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:482)
  at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:474)
  at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239)
  at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
  at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.UnknownProducerIdException

我们用最小的测试案例重现了这个问题,我们将消息从源流移动到另一个流而没有任何转换。源流包含数月内生成的数百万条消息。使用以下StreamsConfig创建KafkaStreams对象:

  • StreamsConfig.PROCESSING_GUARANTEE_CONFIG =" exact_once"
  • StreamsConfig.APPLICATION_ID_CONFIG ="某些应用ID"
  • StreamsConfig.NUM_STREAM_THREADS_CONFIG = 1
  • ProducerConfig.BATCH_SIZE_CONFIG = 102400

应用程序能够在异常发生之前处理一些消息。

背景信息:

  • 我们正在运行一个包含5个zookeeper节点的5节点Kafka 1.1.0群集。
  • 有多个应用正在运行的实例

之前有没有人见过这个问题,或者可以给我们一些关于可能导致这种行为的提示?

更新

我们从头开始创建了一个新的1.1.0集群,并开始处理新消息而没有任何问题。但是,当我们从旧集群导入旧消息时,我们在一段时间后会遇到相同的UnknownProducerIdException。

接下来,我们尝试将接收器主题上的cleanup.policy设置为compact,同时将retention.ms保持为3年。现在错误没有发生。但是,消息似乎已丢失。源偏移量为1.06亿,接收器偏移量为1亿。

1 个答案:

答案 0 :(得分:0)

如评论中所述,当前似乎存在一个错误,当重放比(最大可配置?)保留时间更旧的消息时,可能会导致问题。

在撰写本文时尚未解决,可以在此处始终查看最新状态:

https://issues.apache.org/jira/browse/KAFKA-6817