Question

您好我正在尝试将CSV文件加载到spark数据帧。我正在使用DataBricks CSV jar来加载数据。我在Json文件中有数据模式，并希望将该模式应用于DataFrame。

下面是我的Json架构文件： -

 {
  "type" : "struct",
  "doc": "This is sample",
  "fields" : [ {
    "name" : "Name",
    "type" : "string" ,
    "nullable" : "true" 
  }, {
    "name" : "Address1",
    "type" : "string",
    "nullable" : "true" 
  }, {
    "name" : "Address2",
    "type" : "string",
    "nullable" : "true" 
  }, {
    "name" : "City",
    "type" : "string",
    "nullable" : "true" 
  }]
}

Answer 1

以下代码可能对您有所帮助。

StructType tempSchema = new StructType(new StructField[]{
            new StructField("name", DataTypes.StringType, true, Metadata.empty()),
            new StructField("Address1", DataTypes.StringType, true, Metadata.empty()),
            new StructField("Address2", DataTypes.StringType, true, Metadata.empty()),
            new StructField("City", DataTypes.StringType, true, Metadata.empty())
        });

    Dataset<Row> resultDs = spark.createDataFrame(dataRows, tempSchema);

如何给Json模式文件引发1.6加载模式以激发DataFrame

1 个答案: