Question

为Scala找到了几个想法，但无法在Java中成功实现，因此发布了一个新问题。

我需要在Kafka主题

的“值”列流中格式化输入JSON

Dataset<Row> output = df.select(functions.from_json(df.col("value"), schema));

StructType schema = new StructType();
schema.add("Id", DataTypes.StringType);
schema.add("Type", DataTypes.StringType);
schema.add("KEY", DataTypes.StringType);
schema.add("condition", DataTypes.IntegerType);
schema.add("seller_Id", DataTypes.IntegerType);
schema.add("seller_Name", DataTypes.StringType);
schema.add("isActive", DataTypes.BooleanType);

达到了以下要点，以便在控制台水槽上打印 -

StreamingQuery query = output.writeStream().format("console").start();

+-------------------------+ 
|     jsontostructs(value)|
+-------------------------+
|                    []   |
+-------------------------+

请告知如何从此结构中获取单个列。

Answer 1

因此，基本上需要将“ from_json”函数与schema.json（）函数结合使用以获取String模式（类似于上面在scala中提到的Filip）。希望对别人有帮助。

StructType schema = new StructType();
schema.add("Id", DataTypes.StringType);
schema.add("Type", DataTypes.StringType);
schema.add("KEY", DataTypes.StringType);
schema.add("condition", DataTypes.IntegerType);
schema.add("seller_Id", DataTypes.IntegerType);
schema.add("seller_Name", DataTypes.StringType);
schema.add("isActive", DataTypes.BooleanType);

Dataset<Row> output = df.select(from_json(df.col("value"), DataType.fromjson(schema.json())).as("data")).select("data.*");

最后一个选择将直接将结构平整到模式下定义的字段中。

Answer 2

您已经为JSON消息定义了一个架构...

val sparkSession = SparkSession.builder()
    .master("local[*]")
    .appName("test")
    .getOrCreate()

val df: DataFrame = sparkSession
    .readStream
    .format("kafka")...

import org.apache.spark.sql.functions._
import sparkSession.implicits._

val ds = df.select($"value" cast "string" as "json")
        .select(from_json($"json", schema) as "data")
        .select("data.*")

请注意，当您在不带水印的流DF / DS上打乱流聚合时，不支持append输出模式，因此，如果您想对聚合发疯，请记住将输出更新为以下行：

val query = aggregations
           .writeStream
           .outputMode("complete")
           .format("console")
           .start()

query.awaitTermination()

使用JAVA中的Kafka JSON输入格式进行Spark结构化流式处理

2 个答案: