我正在研究一个火花流应用程序,该应用程序从包含json文件的kafka队列中读取并将这些文件解析为case类,以进行进一步处理。案例类的结构如下:
case class KafkaPayload[T](
headerInfo: String,
data: T
)
case class DataSourceA(
field1: String,
field2: String
)
case class DataSourceB(
fieldA: String,
fieldB: String
)
其中T是来自不同来源的数据,这些数据写入具有不同结构/字段的kafka主题 解析功能如下:
def parseRecords[T: ClassTag](incomingStream: DStream[ConsumerRecord[String, String]]): DStream[T] = {
val returnStream = incomingStream.mapPartitions(records => {
val mapper = new ObjectMapper()
mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
mapper.registerModule(DefaultScalaModule)
records.flatMap(record => {
try {
Some(mapper.readValue(record.value(), classTag[T].runtimeClass))
}
catch {
case e: Exception => None
}
})
}, preservePartitioning = true)
return returnStream.asInstanceOf[DStream[T]]
}
和调用代码:
val aRecords = KafkaFunctions.parseRecords[KafkaPayload[DataSourceA]](JsonRecords)
val bRecords = KafkaFunctions.parseRecords[KafkaPayload[DataSourceB]](JsonRecords)
但是我遇到的问题是数据返回为:
Some(KafkaPayload(test,Map(field1 -> one, field2 -> two)))
我尝试更改功能以从
获取TypeReferencecom.fasterxml.jackson.core.`type`.TypeReference
但这会导致序列化问题, 无论如何,还是这个问题的更整洁的解决方案? 谢谢,