Spark通过更改架构读取Json

时间:2019-10-02 10:05:40

标签: json scala apache-spark apache-spark-sql

我正在读取具有以下架构的Json文件:

    root
 |-- events: struct (nullable = true)
 |    |-- profile: struct (nullable = true)
 |    |    |-- clusters: struct (nullable = true)
 |    |    |    |-- 10: long (nullable = true)
 |    |    |    |-- 102: long (nullable = true)
 |    |    |    |-- 105: long (nullable = true)
 |    |    |    |-- 106: long (nullable = true)
 |    |    |    |-- 109: long (nullable = true)
 |    |    |    |-- 110: long (nullable = true)  

我需要通过嵌套选择来制作一个数据框。

spark.read.format("json").
option("compression","gzip").
load("datamining_20191001-000000_24.json.gz").
select("events.profile.clusters.*").limit(5).show()

不过,架构已进行了一些更改,使事情变得更加棘手:

 root
 |-- events: struct (nullable = true)
 |    |-- profile: struct (nullable = true)
 |    |    |-- segments: struct (nullable = true)
 |    |    |    |-- 10: long (nullable = true)
 |    |    |    |-- 102: long (nullable = true)
 |    |    |    |-- 105: long (nullable = true)
 |    |    |    |-- 106: long (nullable = true)
 |    |    |    |-- 109: long (nullable = true)
 |    |    |    |-- 110: long (nullable = true)  

如何修改代码,使其将字段段和群集读取为同一列?

0 个答案:

没有答案