我有一行json文件,例如以下
$ dotnet msbuild /t:PublishAll /p:Configuration=Release
Microsoft (R) Build Engine version 15.1.1012.6693
Copyright (C) Microsoft Corporation. All rights reserved.
app2 -> /Users/martin/testproj/app2/bin/Release/netcoreapp1.1/app2.dll
app1 -> /Users/martin/testproj/app1/bin/Release/netcoreapp1.1/app1.dll
app1 -> /Users/martin/testproj/app1/bin/Release/netcoreapp1.0/app1.dll
如果我使用以下内容读取json到spark上下文,则会产生
{"Hotel Dream":{"Guests":20,"Address":"14 Naik Street","City":"Manila"},"Serenity Stay":{"Guests":35,"Address":"10 St Marie Road","City":"Manila"}....}
我想转换不同的列(Hotel Dream,Serenity Stay等),以便数据帧最终成为正则化的架构
val hotelDF = sqlContext.read.json("file").printSchema
root
|-- Hotel Dream: struct (nullable = true)
| |-- Address: string (nullable = true)
| |-- City: string (nullable = true)
| |-- Guests: long (nullable = true)
|-- Serenity Stay: struct (nullable = true)
| |-- Address: string (nullable = true)
| |-- City: string (nullable = true)
| |-- Guests: long (nullable = true)
还尝试将json注释为textFile或wholeTextFiles。但由于没有换行符,我无法使用地图功能映射内容。
有关如何阅读此类数据格式的任何输入?
答案 0 :(得分:0)
以下可以是我从您的问题中理解的解决方案(虽然它不是一个完美的解决方案)
var newDataFrame = Seq(("test", "test", "test", "test")).toDF("Hotel", "Address", "City", "Guests")
for(name <- hotelDF.schema.fieldNames) {
val tempdf = hotelDF.withColumn("Hotel", lit(name))
.withColumn("Address", hotelDF(name + ".Address"))
.withColumn("City", hotelDF(name + ".City"))
.withColumn("Guests", hotelDF(name + ".Guests"))
val tdf = tempdf.select("Hotel", "Address", "City", "Guests")
newDataFrame = newDataFrame.union(tdf)
}
newDataFrame.filter(!(col("Hotel") === "test")).show