在Spark Streaming中从String创建StructType

时间:2017-08-30 17:52:50

标签: apache-spark apache-spark-sql spark-dataframe spark-streaming

在Spark结构化流媒体中,我想从STRING创建一个StructType。

在下面的示例中,spark read方法仅接受模式的“Struct Type”,如何从String创建StructType。我想将employeeSchema String转换为StructType。

public static void main(String[] args) throws AnalysisException {
    String master = "local[*]";

    SparkSession sparkSession = SparkSession
            .builder().appName(EmployeeSchemaLoader.class.getName())
            .master(master).getOrCreate();

    String employeeSchema = "StructType(\n" +
            "StructField(firstName,StringType,true),\n" +
            "StructField(lastName,StringType,true),\n" +
            "StructField(addresses,\n" +
            "ArrayType(\n" +
            "StructType(\n" +
            "StructField(city,StringType,true), \n" +
            "StructField(state,StringType,true)\n" +
            "),\n" +
            "true),\n" +
            "true) \n" +
            ")";

    SparkContext context = sparkSession.sparkContext();
    context.setLogLevel("ERROR");
    SQLContext sqlCtx = sparkSession.sqlContext();
    Dataset<Row> employeeDataset = sparkSession.read()
            //.schema(employeeSchema)  // Accepts only Struct Type
            .json("simple_employees.json");

    employeeDataset.printSchema();
    employeeDataset.createOrReplaceTempView("employeeView");

    sparkSession.catalog().listTables().show();

    sqlCtx.sql("select * from employeeView").show();

2 个答案:

答案 0 :(得分:1)

我不确定你为什么要这样做。而不是使employeeSchema成为String,为什么不将它变成StructType?像这样:

StructType employeeSchema = StructType(
    StructField(firstName,StringType,true),
    StructField(lastName,StringType,true),
    StructField(addresses, ArrayType(StructType(
            StructField(city,StringType,true), 
            StructField(state,StringType,true)
    ), true), true) 

答案 1 :(得分:0)

from pyspark.sql.types import StructType

schema = inputdf.schema
print(type(inputdf.schema))

# just to display all methods available on schema
print(dir(schema))

new_schema = StructType.fromJson(schema.jsonValue())

print(type(new_schema))