将Json数据拆分为数组<string>列</string>

时间:2014-12-19 10:52:36

标签: json hadoop hive

我有很多json数组存储在这样的表中:

{"p_id":
   {"id_type":"XXX","id":"ABC111"},
   "r_ids":[
      {"id_type":"HAWARE_ABCDA1","id":"dfe234fhgt"},
      {"id_type":"HAWARE_CDFE2","id":"sgteth5673"}
   ]
}

我的要求是以下列格式获取数据:

p_id , p_id_type ,r_ids (array string), r_id_type (array string)

例如:XXX,ABC111,[dfe234fhgt,sgteth5673],[HAWARE_ABCDA1,HAWARE_CDFE2]

我能够以爆炸格式获取整个集合但是如何生成数组

我当前的查询:

select p_id
      ,p_id_type
      ,get_json_object(c.qqqq,'$.id') as r_id
      ,get_json_object(c.qqqq,'$.id_type') as r_id_type
from
(
select p_id
      ,p_id_type
      ,qqqq
    from
    (
      select 
        get_json_object(a.main_pk,'$.id_type') as p_id_type
       ,get_json_object(a.main_pk,'$.id') as p_id
       ,split(regexp_replace(regexp_replace(a.r_ids,'\\}\\,\\{','\\}\\;\\{'),'\\[|\\]',''),'\\;') as yyyy
      from
      (
        select 
          get_json_object(json_string,'$.p_id') as main_pk
         ,get_json_object(json_string, '$.r_ids') as r_ids
        from sample_table limit 10
       ) a
    ) b lateral view explode(b.yyyy) yyyy_exploded as qqqq
   )c

任何人都可以帮我解决我的错误吗?任何建议将不胜感激。

1 个答案:

答案 0 :(得分:0)

如果使用JsonSerDe,则解决复杂数据类型会更容易。 我在这里给出一个小例子,你可以用这个来解决:

CREATE TABLE table_json (
  p_id struct<id_type:string,
              id:string,
              r_ids:array<struct<id_type:string,
                                  id:string>>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

LOAD DATA LOCAL INPATH '<path>/your_file.json'
OVERWRITE INTO TABLE table_json;