Spark DataFrame将Hive结构类型读取为数组而不是Map

时间:2019-06-20 08:15:25

标签: apache-spark apache-spark-sql

我有一个基本问题要问,如何在蜂巢struct类型中读取spark数据帧。例如,我有一个如下所示的配置单元表:

user_id (string)
current_address (struct<city:string,state:string>)
previous_address (array<struct<city:string,state:string>>)



+--------------+------------------------------+-----------------------------------------------------------------+
| user_id      | current_address              |  previous_address                                               |
+--------------+------------------------------+-----------------------------------------------------------------+
| 1            |{"city":"Tampa","state":"FL"} | [{"city":"Newark","state":"NJ"},{"city":"Denver","state":"CO"}] |
+--------------+------------------------------+-----------------------------------------------------------------+
| 2            |{"city":"NY","state":"NY"}    | [{"city":"Austin","state":"TX"}]                                |
+--------------+------------------------------+-----------------------------------------------------------------+

SparkSQL将其读取为数据框,如下所示:

root
 |-- user_id: string (nullable = true)
 |-- current_address: struct (nullable = true)
 |    |-- city: string (nullable = true)
 |    |-- state: string (nullable = true)
 |-- previous_address: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- city: string (nullable = true)
 |    |    |-- state: string (nullable = true)

+--------------+-----------------+--------------------------+
| user_id      | current_address |  previous_address        |
+--------------+-----------------+------------ +------------+
| 1            |[Tampa,FL]       |[[Newark,NJ],[Denver,CO]] |
+--------------+-----------------+--------------------------+
| 2            |[NY,NY]          | [[Austin,TX]]            |
+--------------+-----------------+--------------------------+

看起来像蜂巢struct类型的

作为数组读取。以后的计划是将数据框转换为map并使用键及其值进行其他操作

如何使spark读取这些结构字段(即current_addressprevious_address作为像Map这样的键值,而不是像数组和数组数组那样的键值,所以我将以类似{{ 1}}和Map[String, String]而不是WrappedArray[Map[String, String]]Array[String]

0 个答案:

没有答案