卡夫卡生产者TimeOutException

时间:2018-11-09 09:35:48

标签: apache-kafka kafka-producer-api apache-samza

我正在运行一个Samza流作业,该作业正在将数据写入Kafka主题。 Kafka正在运行一个3节点群集。 Samza作业部署在纱线上。我们在容器日志中看到了许多此类异常:

 INFO [2018-10-16 11:14:19,410] [U:2,151,F:455,T:2,606,M:2,658] samza.container.ContainerHeartbeatMonitor:[ContainerHeartbeatMonitor:stop:61] - [main] - Stopping ContainerHeartbeatMonitor
ERROR [2018-10-16 11:14:19,410] [U:2,151,F:455,T:2,606,M:2,658] samza.runtime.LocalContainerRunner:[LocalContainerRunner:run:107] - [main] - Container stopped with Exception. Exiting process now.
org.apache.samza.SamzaException: org.apache.samza.SamzaException: Unable to send message from TaskName-Partition 15 to system kafka.
        at org.apache.samza.task.AsyncRunLoop.run(AsyncRunLoop.java:147)
        at org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:694)
        at org.apache.samza.runtime.LocalContainerRunner.run(LocalContainerRunner.java:104)
        at org.apache.samza.runtime.LocalContainerRunner.main(LocalContainerRunner.java:149)
Caused by: org.apache.samza.SamzaException: Unable to send message from TaskName-Partition 15 to system kafka.
        at org.apache.samza.system.kafka.KafkaSystemProducer$$anon$1.onCompletion(KafkaSystemProducer.scala:181)
        at org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.java:109)
        at org.apache.kafka.clients.producer.internals.RecordBatch.maybeExpire(RecordBatch.java:160)
        at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortExpiredBatches(RecordAccumulator.java:245)
        at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:212)
        at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 5 record(s) for Topic3-16 due to 30332 ms has passed since last attempt plus backoff time

这3种异常类型来的很多。

59088 org.apache.kafka.common.errors.TimeoutException: Expiring 115 record(s) for Topic3-1 due to 30028 ms has passed since last attempt plus backoff time

61015 org.apache.kafka.common.errors.TimeoutException: Expiring 60 record(s) for Topic3-1 due to 74949 ms has passed since batch creation plus linger time

62275 org.apache.kafka.common.errors.TimeoutException: Expiring 176 record(s) for Topic3-4 due to 74917 ms has passed since last append

请帮助我了解这里的问题。每当发生故障时,Samza容器都会重新启动。

1 个答案:

答案 0 :(得分:2)

该错误表明某些记录以比从客户端发送记录更快的速度放入队列。

当生产者发送消息时,它们存储在缓冲区中(在将消息发送到目标代理之前),并且记录被分为几批以提高吞吐量。将新记录添加到批处理时,必须在xtabs(~df$id+df$country) or xtabs(~df+country,data=df) 控制的可配置时间窗口内发送记录(默认设置为30秒)。如果批处理在队列中放置的时间较长,则会抛出 df$country df$id FR IT USA 1 0 2 1 2 1 1 1 3 1 1 2 ,然后将批处理记录从队列中删除,并且不会传递给代理。

增加request.timeout.ms的值应该可以解决这个问题。

如果此方法不起作用,您还可以尝试减少TimeoutException,以使批次发送的频率更高(但是这次将包含较少的消息),并确保request.timeout.ms设置为0(这是默认值)。

请注意,您需要在更改任何配置参数后重新启动kafka代理。

如果仍然出现错误,我认为您的网络出现了问题。您启用了SSL吗?