如何最好地处理来自KafkaConsumer poll方法的SerializationException

时间:2018-03-15 11:10:07

标签: java apache-kafka kafka-consumer-api

在Kafka的消费者轮询循环中,当poll方法抛出SerializationException时,是否有一种方法可以跳过此消息(又称“毒丸”)并继续使用该主题中的下一个事件?< / p>

我可以捕获异常并使用consumer.seek()方法将偏移量移动到下一条消息,但该方法需要分区和偏移量作为输入参数。有没有办法获得这些价值观?

我在github repos中有示例代码。要运行示例:

$ git clone https://github.com/bjornhjelle/kafka-streams-examples-gradle.git
$ cd kafka-streams-examples-gradle
$ ./gradlew build -x test
$ ./gradlew test --tests no.test.SerializationExceptionExample

该示例向Kafka生成三个事件。第二个事件导致SerializationException。捕获并记录异常。此时我想通过此事件移动偏移量。而是在poll循环中再次抛出。因此不会消耗第三个事件,因此测试失败。

我知道同一主题的这个未解决的问题,但它提到了Kafka客户端版本&lt; 0.10.0.1,而我使用的是1.0.0版本: https://issues.apache.org/jira/browse/KAFKA-4740

我也知道我可以通过使用Kafka Streams和在那里处理毒丸的新功能来解决它​​(KIP-161: streams deserialization exception handlers

首先让我看到这个的原因是这个例外: (示例代码导致不同的SerializationException,因为我无法重新创建这个)

Exception in thread "SimpleAsyncTaskExecutor-3"         org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition apcitxpt-1 at offset 339798. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:500)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:135)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.getByteBuffer(AbstractKafkaAvroDeserializer.java:77)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:119)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:93)
at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:55)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:65)
at org.apache.kafka.common.serialization.ExtendedDeserializer$Wrapper.deserialize(ExtendedDeserializer.java:55)
at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:923)
at org.apache.kafka.clients.consumer.internals.Fetcher.access$2600(Fetcher.java:93)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.fetchRecords(Fetcher.java:1100)
at org.apache.kafka.clients.consumer.internals.Fetcher$PartitionRecords.access$1200(Fetcher.java:949)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchRecords(Fetcher.java:570)
at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:531)
at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1170)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1103)
at no.ruter.nextgen.kafkaConsumerRunners.ApcInitRunner.run(ApcInitRunner.java:63)
at java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:0)

我找到的解决方案是解析序列化异常消息以获取所需的数据。主题名称,分区和偏移量:

catch (SerializationException se) {
                String s = se.getMessage().split("Error deserializing key/value for partition ")[1].split(". If needed, please seek past the record to continue consumption.")[0];
                String topic = s.split("-")[0];
                int offset = parseInt(s.split("offset ")[1]);
                int partition = parseInt(s.split("-")[1].split(" at")[0]);

                TopicPartition topicPartition = new TopicPartition(topic, partition);
                logger.debug("Skipping {}-{} offset {}", topic, partition , offset);
                consumer.seek(topicPartition, offset + 1L);}