我正在读取具有以下架构的Json文件:
root
|-- events: struct (nullable = true)
| |-- profile: struct (nullable = true)
| | |-- clusters: struct (nullable = true)
| | | |-- 10: long (nullable = true)
| | | |-- 102: long (nullable = true)
| | | |-- 105: long (nullable = true)
| | | |-- 106: long (nullable = true)
| | | |-- 109: long (nullable = true)
| | | |-- 110: long (nullable = true)
我需要通过嵌套选择来制作一个数据框。
spark.read.format("json").
option("compression","gzip").
load("datamining_20191001-000000_24.json.gz").
select("events.profile.clusters.*").limit(5).show()
不过,架构已进行了一些更改,使事情变得更加棘手:
root
|-- events: struct (nullable = true)
| |-- profile: struct (nullable = true)
| | |-- segments: struct (nullable = true)
| | | |-- 10: long (nullable = true)
| | | |-- 102: long (nullable = true)
| | | |-- 105: long (nullable = true)
| | | |-- 106: long (nullable = true)
| | | |-- 109: long (nullable = true)
| | | |-- 110: long (nullable = true)
如何修改代码,使其将字段段和群集读取为同一列?