从带有扁平字段的嵌套json数据创建配置单元表

时间:2018-12-17 12:44:47

标签: hive hql hiveql

我想从嵌套的json数据创建外部配置单元表,但字段应从嵌套的json展平。

例如:-

{

    "key1":"value1",
    "key2":{
        "nestedKey1":1,
        "nestedKey2":2
    }

}

配置单元表的格式或字段应扁平化

  

key1:字符串,key2.nestedKey1:Int,key2.nestedKey1:Int

预先感谢

1 个答案:

答案 0 :(得分:1)

使用 JsonSerDe 并使用以下语法创建表:

hive> create table sample(key1 string,key2 struct<nestedKey1:int,nestedKey2:int>) 
      ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';

hive> select key1,key2.nestedkey1,key2.nestedkey2 from sample;
+---------+-------------+-------------+--+
|  key1   | nestedkey1  | nestedkey2  |
+---------+-------------+-------------+--+
| value1  | 1           | 2           |
+---------+-------------+-------------+--+

hive> select * from sample;
+--------------+----------------------------------+--+
| sample.key1  |           sample.key2            |
+--------------+----------------------------------+--+
| value1       | {"nestedkey1":1,"nestedkey2":2}  |
+--------------+----------------------------------+--+

(或)

如果您要创建平坦化的json fields表,请使用 RegexSerDe 并匹配正则表达式从数据中提取嵌套键

>

请参考this链接以获取有关正则表达式Serde的更多详细信息。


更新:

输入数据:

{"key1":"value1","key2":{"nestedKey1":1,"nestedKey2":2}}

HiveTable:

hive> CREATE  TABLE dd (key1 string, nestedKey1 string, nestedKey2 string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES 
('input.regex'=".*:\"(.*?)\",\"key2\":\\{\"nestedKey1\":(\\d),\"nestedKey2\":(\\d).*$");

从表中选择数据:

hive>  select * from dd;
+---------+-------------+-------------+--+
|  key1   | nestedkey1  | nestedkey2  |
+---------+-------------+-------------+--+
| value1  | 1           | 2           |
+---------+-------------+-------------+--+