无法反序列化来自Kafka的数据

时间:2018-09-17 15:27:29

标签: spring-boot apache-spark apache-kafka apache-spark-sql

在Spring Boot应用程序中,我使用了 Kafka Spark ,其中Spark从Kafka读取流,转换数据并将结果最终发送到Kafka:

StreamingQuery kafka = scoring
                .writeStream()
                .format("kafka")
                .outputMode(OutputMode.Complete())
                .option("kafka.bootstrap.servers", bootstrapServers)
                .option("topic", outputTopic)
                .option("checkpointLocation", "~/Desktop/checkpoint")
                .queryName("urlCounterKafkaStream")
                .start();

Spark发送的数据有2个字段(名称,计数)。

在kafka侦听器应用程序上,我实现了以下简单的反序列化器:

public class RSSItemDeserializer extends JsonDeserializer<RSSItemDTO> {
    public RSSItemDeserializer() {
        super(RSSItemDTO.class);
    }
}

并在application.properties上进行设置

spring.kafka.consumer.value-deserializer=com.noname.deserializer.RSSItemDeserializer

但是有序列化异常:

org.apache.kafka.common.errors.SerializationException: Error deserializing key/value for partition urlCounterStream-0 at offset 0. If needed, please seek past the record to continue consumption.
Caused by: org.apache.kafka.common.errors.SerializationException: Can't deserialize data [[104, 116, 116, 112, 115, 58, 47, 47, 119, 119, 119, 46, 48, 53, 53, 50, 46, 117, 97]] from topic [urlCounterStream]
Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'https': was expecting ('true', 'false' or 'null')
at [Source: (byte[])"https://www.0552.ua"; line: 1, column: 7]

我错过了什么吗?如何解决该问题并反序列化数据?

谢谢!

1 个答案:

答案 0 :(得分:0)

我的问题是我假设spark默认将数据发送为json。这种情况的解决方案是在结果上使用toJSON()方法,然后将其发送给kafka作为

StreamingQuery kafka = scoring.toJSON()
                .writeStream()
                ...

也许对某人会有所帮助。