如何使用CustomJsonParser解析Spark Structured Streaming中的json字符串?

时间:2018-01-11 06:29:32

标签: json apache-spark parsing apache-kafka spark-structured-streaming

用户将提供一个CustomJsonParser来将部分JSON字符串解析为CustomObject,而不是解析整个JSON字符串。如何使用此CustomJsonParser在Spark Structured Streaming中转换JSON字符串,而不是使用 from_json get_json_object 方法?

示例代码如下:

val jsonDF = spark.readStream.format("kafka")
            .option("kafka.bootstrap.servers", kakfaBrokers)
            .option("subscribe", kafkaConsumeTopicName)
            .option("group.id", kafkaConsumerGroupId)
            .option("startingOffsets", startingOffsets)
            .option("auto.offset.reset", autoOffsetReset)
            .option("key.deserializer", classOf[StringDeserializer].getName)
            .option("value.deserializer", classOf[StringDeserializer].getName)
            .option("enable.auto.commit", "false")
            .load()

val messagesDF = jsonDF.selectExpr("CAST(value AS STRING)")

spark.udf.register("parseJson", (json: String) =>
    customJsonParser.parseJson(json)
)

val objDF = messagesDF.selectExpr("""parseJson(value) AS message""")

val query = objDF.writeStream
            .outputMode(OutputMode.Append())
            .format("console")
            .start()

query.awaitTermination()

它运行时出现以下错误:

  

线程中的异常" main" java.lang.UnsupportedOperationException:   不支持类型为com.xxx.xxxEntity的模式   org.apache.spark.sql.catalyst.ScalaReflection $ .schemaFor(ScalaReflection.scala:755)     在   org.apache.spark.sql.catalyst.ScalaReflection $ .schemaFor(ScalaReflection.scala:693)     在   org.apache.spark.sql.UDFRegistration.register(UDFRegistration.scala:159)

0 个答案:

没有答案