我有一个JSON文件,在加载到Spark SQL时,键将成为我的列。现在,当我想要检索列名时,它按字母顺序检索。但我希望细节应该按照它在文件中的显示顺序
我的输入数据 是
{"id":1,"name":"Judith","email":"jknight0@google.co.uk","city":"Évry","country":"France","ip":"199.63.123.157"}
下面是我检索列名并构建单个字符串的方法
val dataframe = sqlContext.read.json("/virtual/home/587635/users.json")
val columns = dataframe.columns
var query = columns.apply(0)+" STRING"
for (a <- 1 to (columns.length-1))
{
query = query + ","+ columns.apply(a) + " STRING"
}
println(query)
这给我输出如下
city STRING,country STRING,email STRING,id STRING,ip STRING,name STRING
但我希望我的输出为
id STRING,name STRING,email STRING,city STRING,country STRING,ip STRING
答案 0 :(得分:1)
使用正确排序的列添加select
val dataframe =
sqlContext
.read
.json("/tmp/test.jsn")
.select("id", "name", "email", "city", "country", "ip")
如果你在shell上试过这个,你会注意到正确的顺序
dataframe:org.apache.spark.sql.DataFrame = [id:bigint,name:string, email:string,city:string,country:string,ip:string]
执行脚本的其余部分,输出符合预期
id STRING,名称STRING,电子邮件STRING,城市STRING,国家/地区STRING,IP STRING