Apache Hive - 复杂数据类型映射<string,struct =“”>无法正常工作

时间:2017-08-12 06:10:09

标签: hive

Hive Version 2.1.1

问题描述:集合项终止值作为地图键插入

Hive表:

CREATE TABLE profiles(
id int,
name struct<first_name: string, middle_name: string, last_name: string>,
phone struct<home: string, office: string>,
address map<string,struct<streat:string, appartment:int, zip:string>>
) 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '-'
MAP KEYS TERMINATED BY '='
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

数据:

1000,Suresh--S,1234567890-1234567890,home=Venkatapuram1-2020-500001
1001,Mahesh-X-M,1234567890-1234567890,home=Venkatapuram2-2021-500001

数据加载:

load data inpath '/handson/profiles_data.txt' overwrite into table profiles;

select语句中的实际数据:

SELECT * FROM profiles; 

1000        
{"first_name":"Suresh","middle_name":"","last_name":"S"}        
{"home":"1234567890","office":"1234567890"}     
{"home": 
{"streat":"Venkatapuram1",**"appartment":null,"zip":null},"2020":null, 
"500001": null}

1001        
{"first_name":"Mahesh","middle_name":"X","last_name":"M"}        
{"home":"1234567890","office":"1234567890"}
{"home": 
{"streat":"Venkatapuram2",**"appartment":null,"zip":null},"2021":null, 
"500001": null}

预期:

1000        
{"first_name":"Suresh","middle_name":"","last_name":"S"}        
{"home":"1234567890","office":"1234567890"}
{"home":{"streat":"Venkatapuram1",**"appartment":2020,"zip":"500001"}**}

1001        
{"first_name":"Mahesh","middle_name":"X","last_name":"M"}        
{"home":"1234567890","office":"1234567890"} 
{"home": {"streat":"Venkatapuram2",**"appartment":2021,"zip":"500001"**}}

1 个答案:

答案 0 :(得分:0)

正如在HIVE nested ARRAY in MAP data type中回答的那样,你只能覆盖hive中的前三个分隔符,而hive实际上支持8.在嵌套数据结构中,对于每个嵌套级别,使用一个后续分隔符。

在你的hive表中,address映射中的结构中字段之间的分隔符是\ u004(Unicode 4),它不能被覆盖。

您应该将输入更改为:

1000,Suresh--S,1234567890-1234567890,home=Venkatapuram1\u00042020\u0004500001 
1001,Mahesh-X-M,1234567890-1234567890,home=Venkatapuram2\u00042021\u0004500001