在Apache Spark SQL中创建了一个嵌套模式

时间:2017-07-26 01:06:34

标签: java json apache-spark apache-spark-sql apache-spark-dataset

我想在我的SparkSession中加载一个简单的JSON模式,该模块具有带地址数组的员工。示例JSON位于

之下
{"firstName":"Neil","lastName":"Irani", "addresses" : [ {  "city" : "Brindavan", "state" : "NJ"  }, {  "city" : "Subala", "state" : "DT"  }]}

我试图创建加载我的JSON的架构,我相信在下面创建架构的方式有问题...请指教..下面的代码是Java ...我不能找到合理的样本

    List<StructField> employeeFields = new ArrayList<>();
    employeeFields.add(DataTypes.createStructField("firstName", DataTypes.StringType, true));
    employeeFields.add(DataTypes.createStructField("lastName", DataTypes.StringType, true));
    employeeFields.add(DataTypes.createStructField("email", DataTypes.StringType, true));

    List<StructField> addressFields = new ArrayList<>();
    addressFields.add(DataTypes.createStructField("city", DataTypes.StringType, true));
    addressFields.add(DataTypes.createStructField("state", DataTypes.StringType, true));
    addressFields.add(DataTypes.createStructField("zip", DataTypes.StringType, true));

    employeeFields.add(DataTypes.createStructField("addresses", DataTypes.createStructType(addressFields), true));

    StructType employeeSchema = DataTypes.createStructType(employeeFields);


    Dataset<Employee>  rowDataset = sparkSession.read()
            .option("inferSchema", "false")
            .schema(employeeSchema)
            .json("simple_employees.json").as(employeeEncoder);

更新

我没有创建数组类型,下面的代码可以正常工作

List<StructField> employeeFields = new ArrayList<>();
employeeFields.add(DataTypes.createStructField("firstName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("lastName", DataTypes.StringType, true));
employeeFields.add(DataTypes.createStructField("email", DataTypes.StringType, true));

List<StructField> addressFields = new ArrayList<>();
addressFields.add(DataTypes.createStructField("city", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("state", DataTypes.StringType, true));
addressFields.add(DataTypes.createStructField("zip", DataTypes.StringType, true));
ArrayType addressStruct = DataTypes.createArrayType( DataTypes.createStructType(addressFields));

employeeFields.add(DataTypes.createStructField("addresses", addressStruct, true));
StructType employeeSchema = DataTypes.createStructType(employeeFields);

0 个答案:

没有答案