如何使用由具有复杂结构数据类型的数据组成的镶木地板文件创建外部配置单元表

时间:2018-01-13 19:36:11

标签: apache-spark struct hive parquet

我有一组镶木地板文件,其中包含一个名为people的表的数据。现在镶木地板文件中的这些数据包括复杂的数据类型,如结构等。镶木地板文件中的数据模式已在下面附上 SCHEMA:

|-- distinct_id: string (nullable = true)
 |-- android_app_version: string (nullable = true)
 |-- android_app_version_code: string (nullable = true)
 |-- android_brand: string (nullable = true)
 |-- android_devices: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- android_lib_version: string (nullable = true)
 |-- android_manufacturer: string (nullable = true)
 |-- android_os: string (nullable = true)
 |-- android_os_version: string (nullable = true)
 |-- android_push_error: string (nullable = true)
 |-- browser: string (nullable = true)
 |-- browser_version: double (nullable = true)
 |-- campaigns: array (nullable = true)
 |    |-- element: long (containsNull = true)
 |-- country_code: string (nullable = true)
 |-- deliveries: array (nullable = true)
 |    |-- element: long (containsNull = true)
 |-- initial_referrer: string (nullable = true)
 |-- initial_referring_domain: string (nullable = true)
 |-- ios_app_release: string (nullable = true)
 |-- ios_app_version: string (nullable = true)
 |-- ios_device_model: string (nullable = true)
 |-- ios_devices: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- ios_lib_version: string (nullable = true)
 |-- ios_version: string (nullable = true)
 |-- last_seen: string (nullable = true)
 |-- notifications: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- $time: string (nullable = true)
 |    |    |-- campaign_id: long (nullable = true)
 |    |    |-- message_id: long (nullable = true)
 |    |    |-- message_subtype: string (nullable = true)
 |    |    |-- message_type: string (nullable = true)
 |    |    |-- time: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |-- os: string (nullable = true)
 |-- predict_grade: string (nullable = true)
 |-- region: string (nullable = true)
 |-- swift_lib_version: string (nullable = true)
 |-- timezone: string (nullable = true)
 |-- area: string (nullable = true)
 |-- country: string (nullable = true)
 |-- dob: string (nullable = true)
 |-- date: string (nullable = true)
 |-- default_languages: string (nullable = true)
 |-- email: string (nullable = true)
 |-- first_app_launch: string (nullable = true)
 |-- first_app_launch_date: string (nullable = true)
 |-- first_login: boolean (nullable = true)
 |-- gaid: string (nullable = true)
 |-- lr_age: string (nullable = true)
 |-- lr_birthdate: string (nullable = true)
 |-- lr_country: string (nullable = true)
 |-- lr_gender: string (nullable = true)
 |-- language: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- languages: string (nullable = true)
 |-- languages_disabled: string (nullable = true)
 |-- languages_selected: string (nullable = true)
 |-- launched: string (nullable = true)
 |-- location: string (nullable = true)
 |-- media_id: string (nullable = true)
 |-- no_of_logins: long (nullable = true)
 |-- pop-strata: string (nullable = true)
 |-- price: string (nullable = true)
 |-- random_number: long (nullable = true)
 |-- second_name: string (nullable = true)
 |-- state: string (nullable = true)
 |-- state_as_per_barc: string (nullable = true)
 |-- total_app_opens: long (nullable = true)
 |-- total_app_sessions: string (nullable = true)
 |-- total_sessions: string (nullable = true)
 |-- town: string (nullable = true)
 |-- user_type: string (nullable = true)
 |-- userid: string (nullable = true)
 |-- appversion: string (nullable = true)
 |-- birthdate: string (nullable = true)
 |-- campaign: string (nullable = true)
 |-- city: string (nullable = true)
 |-- media_source: string (nullable = true)
 |-- last_name: string (nullable = true)
 |-- first_name: string (nullable = true)
 |-- ios_ifa: string (nullable = true)
 |-- android_model: string (nullable = true)
 |-- age: string (nullable = true)
 |-- uid: string (nullable = true)

我想要的是最终创建一个hive ext表,指向镶木地板文件中的数据。一种解决方案可能是扁平化或使用SQL爆炸将结构多样化为单个列数据,但最终我得到了最初为struct数据类型的所有列的空值。镶木地板文件位于天蓝色的blob位置。

我尝试在sparksql的数据框中加载镶木地板文件,但它为具有复杂数据类型的列提供了空值:

enter image description here

0 个答案:

没有答案