嵌套json的Hive查询

时间:2014-03-28 12:28:31

标签: json hadoop hive

我在hive中存储了json

 {"key":123,"c1":["s1","s2","s3"],"c2":{"k1":"v1","k2":"v2"}}
 {"key":456,"c1":["s4","s5","s6"],"c2":{"k3":"v3","k4":"v4"}}

现在我想查询给定的hive json,以便我得到以下输出 输出:

key c1 c1 c1 c2 c2 c2 c2 123 s1 s2 s3 k1 v1 k2 v2 456 s4 s5 s6 k3 v3 k4 v4

那么怎么可能在hive中或者我错过了输出结构?

2 个答案:

答案 0 :(得分:0)

您可以使用Brickhouse JSON UDFS(http://github.com/klout/brickhouse)将JSON解析为Hive结构,然后访问这些值。

SELECT strct.key,
       strct.c1[ 0 ], strct.c1[1], strct.c1[2],
       map_keys( strct.c2 )[ 0 ], map_values( strct.c2)[0],   
       map_keys( strct.c2 )[ 1 ], map_values( strct.c2)[1]
FROM (
  SELECT from_json( json_str, 
        named_struct("key", 0, "c1", array(""), "c2", map("","") )) as strict
  FROM json_table
) js;

阅读Brickhouse confessions博客文章,了解更多信息,请访问http://brickhouseconfessions.wordpress.com/2014/02/07/hive-and-json-made-simple/

答案 1 :(得分:-1)

发布端到端解决方案。将JSON转换为hive表的逐步过程:

步骤1)如果不存在maven,则安装maven

>$ sudo apt-get install maven

步骤2)安装git(如果没有)

>sudo git clone https://github.com/rcongiu/Hive-JSON-Serde.git

步骤3)进入$ HOME / HIVE-JSON_Serde文件夹

步骤4)构建serde包

>sudo mvn -Pcdh5 clean package

步骤5)serde文件将在    的 $ HOME /蜂房JSON-SERDE / JSON-SERDE /目标/ JSON-SERDE-1.3.7-快照罐与 - dependencies.jar

步骤6)在配置单元中添加serde作为依赖jar

 hive> ADD JAR $HOME/Hive-JSON-Serde/json-serde/target/json-serde-1.3.7- SNAPSHOT-jar-with-dependencies.jar;

步骤7)在$ HOME / books.json中创建json文件(示例)

{"value": [{"id": "1","bookname": "A","properties": {"subscription": "1year","unit": "3"}},{"id": "2","bookname":"B","properties":{"subscription": "2years","unit": "5"}}]}

步骤8)在hive中创建tmp1表

 hive>CREATE TABLE tmp1 (
      value ARRAY<struct<id:string,bookname:string,properties:struct<subscription:string,unit:string>>>   
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( 
    'mapping.value' = 'value'   
) 
STORED AS TEXTFILE;

步骤9)将数据从json加载到tmp1表

>LOAD DATA LOCAL INPATH '$HOME/books.json' INTO TABLE tmp1;

步骤10)创建一个tmp2表来执行tmp1的爆炸操作,这个中间步骤是将多级json结构分成多行 注意:如果您的JSON结构简单且单级,请避免执行此步骤

hive>create table tmp2 as 
 SELECT *
 FROM tmp1
 LATERAL VIEW explode(value) itemTable AS items;

步骤11)创建hive表并从tmp2 table

加载值
hive>create table books as 
select value[0].id as id, value[0].bookname as name, value[0].properties.subscription as subscription, value[0].properties.unit as unit from tmp2;

步骤12)删除tmp表

hive>drop table tmp1;
hive>drop table tmp2;

步骤13)测试蜂巢表

hive>select * from books;

输出:

id name subscription unit

1 B 1年3

2 B 2年5