我有一个具有以下架构的数据集:
root
|-- schema_version: integer (nullable = false)
|-- countries: array (nullable = false)
| |-- element: struct (containsNull = true)
| | |-- country_name: binary (nullable = false)
| | |-- cities: array (nullable = false)
| | | |-- element: struct (containsNull = false)
| | | | |-- city_name: binary (nullable = false)
| | | | |-- city_population: long (nullable = true)
如何访问或转换模式中的数据,就像我在以下数据帧中所获得的一样
| country_name | city_name | population |
-----------------------------------------
| | | |
| | | |
| | | |
如何管理架构中的阵列?
答案 0 :(得分:0)
我认为您需要做的只是:
originalDf.select(explode($"countries").as("explode_countries"))
.select(explode($"explode_countries.cities").as("explode_cities"))
.select($"explode_countries.country_name".as("country_name"),
$"explode_cities.city_name".as("city_name"),
$"explode_cities.city_population".as("population"))