当我从json文件创建数据框时,一些信息会丢失在数据框中,并被视为架构的一部分。例如,test.json文件的每一行如下:
{"F":{"P3":"1:0.01","P8":"3:0.03,4:0.04", ...},"I":"blah1"}
{"F":{"P4":"2:0.01,3:0.02","P10":"5:0.02", ...},"I":"blah2"}
.....
.....
我尝试的是:
df = spark.read.json("test.json")
>>> df.show()
+--------------------+--------------------+
| I | F |
+--------------------+--------------------+
|blah1 |[null,1:0.01, .... |
|blah2 |[2:0.01,3:0.02... |
+--------------------+--------------------+
可以看出,P3,P8,P4,...不包括在F列中,而我却这样做:
f.printSchema
<bound method DataFrame.printSchema of DataFrame[I: string,
F: struct<P3:string,P8:string,....
如何在不丢失数据的情况下将json文件正确转换为数据框?