让我们说我们正在通过Spark流处理一个Kafka主题。 Kafka主题包含JSON。如果由于某种原因,JSON的架构与预期的架构不匹配,则程序将引发org.apache.spark.sql.AnalysisException。我需要做的是提取当前正在处理的JSON,将其重新打包到另一个JSON中,然后将其写入Kafka错误主题。这可能吗?
JavaInputDStream<ConsumerRecord<String, String>> stream = KafkaUtils.createDirectStream(jssc,
LocationStrategies.PreferConsistent(), consumerStrategy);
stream.foreachRDD(rdd -> {
JavaRDD<String> rowRDD = rdd.map(record -> {
return record.value();
});
Dataset<Row> pdataset = spark.read().json(rowRDD);
try {
Dataset<String> pdatasetjson = persondataset
.withColumn("name", get_json_object(col("value"), "$.name"))
.withColumn("ssn", get_json_object(col("value"), "$.ssn"))
.withColumn("gender", get_json_object(col("value"), "$.gender"))
.withColumn("maritalStatus", get_json_object(col("value"), "$.maritalStatus")
.where(col("name").notEqual("")).toJSON();
} catch (Exception e) {
// Need to extract current JSON being worked on here
}
}