在Spark结构化流媒体中,我想从STRING创建一个StructType。
在下面的示例中,spark read方法仅接受模式的“Struct Type”,如何从String创建StructType。我想将employeeSchema String转换为StructType。
public static void main(String[] args) throws AnalysisException {
String master = "local[*]";
SparkSession sparkSession = SparkSession
.builder().appName(EmployeeSchemaLoader.class.getName())
.master(master).getOrCreate();
String employeeSchema = "StructType(\n" +
"StructField(firstName,StringType,true),\n" +
"StructField(lastName,StringType,true),\n" +
"StructField(addresses,\n" +
"ArrayType(\n" +
"StructType(\n" +
"StructField(city,StringType,true), \n" +
"StructField(state,StringType,true)\n" +
"),\n" +
"true),\n" +
"true) \n" +
")";
SparkContext context = sparkSession.sparkContext();
context.setLogLevel("ERROR");
SQLContext sqlCtx = sparkSession.sqlContext();
Dataset<Row> employeeDataset = sparkSession.read()
//.schema(employeeSchema) // Accepts only Struct Type
.json("simple_employees.json");
employeeDataset.printSchema();
employeeDataset.createOrReplaceTempView("employeeView");
sparkSession.catalog().listTables().show();
sqlCtx.sql("select * from employeeView").show();
答案 0 :(得分:1)
我不确定你为什么要这样做。而不是使employeeSchema成为String,为什么不将它变成StructType?像这样:
StructType employeeSchema = StructType(
StructField(firstName,StringType,true),
StructField(lastName,StringType,true),
StructField(addresses, ArrayType(StructType(
StructField(city,StringType,true),
StructField(state,StringType,true)
), true), true)
答案 1 :(得分:0)
from pyspark.sql.types import StructType
schema = inputdf.schema
print(type(inputdf.schema))
# just to display all methods available on schema
print(dir(schema))
new_schema = StructType.fromJson(schema.jsonValue())
print(type(new_schema))