为什么Kafka流很长时间后仍无法产生数据?

时间:2018-12-30 08:10:27

标签: apache-kafka apache-kafka-streams

KafkaStream长时间无法生成数据。 (超出设置的有效时间)

甚至在记录错误消息后KafkaStream也已死。

异常在下面:

org.apache.kafka.common.KafkaException: Cannot execute transactional method because we are in an error state
at org.apache.kafka.clients.producer.internals.TransactionManager.maybeFailWithError(TransactionManager.java:784)
at org.apache.kafka.clients.producer.internals.TransactionManager.beginAbort(TransactionManager.java:229)
at org.apache.kafka.clients.producer.KafkaProducer.abortTransaction(KafkaProducer.java:660)
at org.apache.kafka.streams.processor.internals.StreamTask.closeSuspended(StreamTask.java:493)
at org.apache.kafka.streams.processor.internals.StreamTask.close(StreamTask.java:553)
at org.apache.kafka.streams.processor.internals.AssignedTasks.close(AssignedTasks.java:405)
at org.apache.kafka.streams.processor.internals.TaskManager.shutdown(TaskManager.java:260)
at org.apache.kafka.streams.processor.internals.StreamThread.completeShutdown(StreamThread.java:1111)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:730)
org.apache.kafka.common.KafkaException: Unexpected error in AddOffsetsToTxnResponse: The producer attempted to use a producer id which is not currently assigned to its transactional id
at org.apache.kafka.clients.producer.internals.TransactionManager$AddOffsetsToTxnHandler.handleResponse(TransactionManager.java:1237)
at org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:907)
at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101)
at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:482)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:474)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
at java.lang.Thread.run(Thread.java:834)

版本:

  1. Kafka经纪人:2.0.0
  2. kafka客户端:1.1.1
  3. kafka-streams:1.1.1

(经纪人和生产者)选项均为默认设置:

  1. TRANSACTION_TIMEOUT_CONFIG
  2. transactional.id.expiration.ms
  3. transaction.max.timeout.ms

代码:

Properties properties = new Properties();
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
properties.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);

StreamsBuilder builder = new StreamsBuilder();
builder.stream("from", Consumed.with(Serdes.Integer(), Serdes.String()))
       .peek((key, value) -> System.out.println(value))
       .to("to", Produced.with(Serdes.Integer(), Serdes.String()), (key, value, numPartitions) -> key % numPartitions));

KafkaStreams streams = new KafkaStreams(bulider.build(), properties);
stream.start();

1 个答案:

答案 0 :(得分:1)

从错误消息来看,这里似乎有一些未知的问题:

  1. 在内部生产者中,我们没有专门处理INVALID_PRODUCER_ID_MAPPING中的AddOffsetsToTxnHandler#handleResponse,这会导致致命错误并引发KafkaException。

  2. 在流中,我们吞下了ProducerFencedException,但是由于1)引发致命的KafkaException,导致它直接死亡。

1)的行为是设计使然,但我承认确实存在一些事后思考的问题:

a。一般来说,与{1)相比,包括INVALID_PRODUCER_ID_MAPPING在内的生产者隔离案例应得到更好的处理。这被称为https://cwiki.apache.org/confluence/display/KAFKA/KIP-360%3A+Improve+handling+of+unknown+producer

b。 Txn Producer应该在“致命”错误和非致命错误之间有更好的区分,后者应在内部处理,而不是移交给调用者。一个很快的想法是,除了生产者错误,我们迄今为止设计的所有其他错误都应视为非致命错误,因此应在内部处理。