我想从嵌套的json数据创建外部配置单元表,但字段应从嵌套的json展平。
例如:-
{
"key1":"value1",
"key2":{
"nestedKey1":1,
"nestedKey2":2
}
}
配置单元表的格式或字段应扁平化
key1:字符串,key2.nestedKey1:Int,key2.nestedKey1:Int
预先感谢
答案 0 :(得分:1)
使用 JsonSerDe
并使用以下语法创建表:
hive> create table sample(key1 string,key2 struct<nestedKey1:int,nestedKey2:int>)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
hive> select key1,key2.nestedkey1,key2.nestedkey2 from sample;
+---------+-------------+-------------+--+
| key1 | nestedkey1 | nestedkey2 |
+---------+-------------+-------------+--+
| value1 | 1 | 2 |
+---------+-------------+-------------+--+
hive> select * from sample;
+--------------+----------------------------------+--+
| sample.key1 | sample.key2 |
+--------------+----------------------------------+--+
| value1 | {"nestedkey1":1,"nestedkey2":2} |
+--------------+----------------------------------+--+
(或)
如果您要创建平坦化的json fields
表,请使用 RegexSerDe
并匹配正则表达式从数据中提取嵌套键。
请参考this链接以获取有关正则表达式Serde的更多详细信息。
更新:
输入数据:
{"key1":"value1","key2":{"nestedKey1":1,"nestedKey2":2}}
HiveTable:
hive> CREATE TABLE dd (key1 string, nestedKey1 string, nestedKey2 string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES
('input.regex'=".*:\"(.*?)\",\"key2\":\\{\"nestedKey1\":(\\d),\"nestedKey2\":(\\d).*$");
从表中选择数据:
hive> select * from dd;
+---------+-------------+-------------+--+
| key1 | nestedkey1 | nestedkey2 |
+---------+-------------+-------------+--+
| value1 | 1 | 2 |
+---------+-------------+-------------+--+