我想在我的SparkSession中加载一个简单的JSON模式,该模块具有带地址数组的员工。示例JSON位于
之下{"firstName":"Neil","lastName":"Irani", "addresses" : [ { "city" : "Brindavan", "state" : "NJ" }, { "city" : "Subala", "state" : "DT" }]}
我试图创建加载我的JSON的架构,我相信在下面创建架构的方式有问题...请指教..下面的代码是Java ...我不能找到合理的样本
List<StructField> employeeFields = new ArrayList<>();
employeeFields.add(DataTypes.createStructField("firstName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("lastName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("email", DataTypes.StringType, true));
List<StructField> addressFields = new ArrayList<>();
addressFields.add(DataTypes.createStructField("city", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("state", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("zip", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("addresses", DataTypes.createStructType(addressFields), true));
StructType employeeSchema = DataTypes.createStructType(employeeFields);
Dataset<Employee> rowDataset = sparkSession.read()
.option("inferSchema", "false")
.schema(employeeSchema)
.json("simple_employees.json").as(employeeEncoder);
更新
我没有创建数组类型,下面的代码可以正常工作
List<StructField> employeeFields = new ArrayList<>();
employeeFields.add(DataTypes.createStructField("firstName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("lastName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("email", DataTypes.StringType, true));
List<StructField> addressFields = new ArrayList<>();
addressFields.add(DataTypes.createStructField("city", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("state", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("zip", DataTypes.StringType, true));
ArrayType addressStruct = DataTypes.createArrayType( DataTypes.createStructType(addressFields));
employeeFields.add(DataTypes.createStructField("addresses", addressStruct, true));
StructType employeeSchema = DataTypes.createStructType(employeeFields);