您好我正在尝试将CSV文件加载到spark数据帧。我正在使用DataBricks CSV jar来加载数据。我在Json文件中有数据模式,并希望将该模式应用于DataFrame。
下面是我的Json架构文件: -
{
"type" : "struct",
"doc": "This is sample",
"fields" : [ {
"name" : "Name",
"type" : "string" ,
"nullable" : "true"
}, {
"name" : "Address1",
"type" : "string",
"nullable" : "true"
}, {
"name" : "Address2",
"type" : "string",
"nullable" : "true"
}, {
"name" : "City",
"type" : "string",
"nullable" : "true"
}]
}
答案 0 :(得分:0)
以下代码可能对您有所帮助。
StructType tempSchema = new StructType(new StructField[]{
new StructField("name", DataTypes.StringType, true, Metadata.empty()),
new StructField("Address1", DataTypes.StringType, true, Metadata.empty()),
new StructField("Address2", DataTypes.StringType, true, Metadata.empty()),
new StructField("City", DataTypes.StringType, true, Metadata.empty())
});
Dataset<Row> resultDs = spark.createDataFrame(dataRows, tempSchema);