如何将模式转换为Java结构化流中的列?

时间:2019-07-15 07:48:17

标签: java apache-spark schema spark-structured-streaming

我有一个具有以下架构的数据集:

root
 |-- schema_version: integer (nullable = false)
 |-- countries: array (nullable = false)
 |    |-- element: struct (containsNull = true)
 |    |    |-- country_name: binary (nullable = false)
 |    |    |-- cities: array (nullable = false)
 |    |    |    |-- element: struct (containsNull = false)
 |    |    |    |    |-- city_name: binary (nullable = false)
 |    |    |    |    |-- city_population: long (nullable = true)

如何访问或转换模式中的数据,就像我在以下数据帧中所获得的一样

 | country_name | city_name | population |
 -----------------------------------------
 |              |           |            |
 |              |           |            |
 |              |           |            |

如何管理架构中的阵列?

1 个答案:

答案 0 :(得分:0)

我认为您需要做的只是:

originalDf.select(explode($"countries").as("explode_countries"))
          .select(explode($"explode_countries.cities").as("explode_cities"))
          .select($"explode_countries.country_name".as("country_name"), 
                  $"explode_cities.city_name".as("city_name"), 
                  $"explode_cities.city_population".as("population"))