由Hive中的get_json_object将具有唯一ID的行分隔为几行

时间:2019-06-10 21:40:57

标签: json hive

我正在尝试通过Hive中的get_json_object将具有唯一问题引用的行分隔为几行,我可以这样吗?

我只是触摸Hive中的get_json_object,尝试使用唯一的问题引用(来自一个json格式列)将一行分隔为几行,并包含其他列信息。

SELECT  reference AS item_reference, 
        get_json_object(questions, '$.reference') AS question_reference,
        get_json_object(questions, '$.type') AS question_type
from sandbox.exportitems limit 10;

例如,给定输入:

reference  | questions  
NP002_025  | [{"reference":"3dfc54c0","type":"clozeformula"}]
DP001_1_10 | [{"reference":"73879547","type":"imageclozeformula"},
              {"reference":"466a5b88","type":"clozedropdown"}]

预期输出为:

reference  | questions_reference | questions_type
NP002_025  | 3dfc54c0            | clozeformula  
DP001_1_10 | 73879547            | imageclozeformula  
DP001_1_10 | 466a5b88            | clozedropdown

1 个答案:

答案 0 :(得分:0)

好的,如下所示:

with core as (
SELECT 
    'DP001_1_10' as reference,
    explode(
        split(
            regexp_replace(
                regexp_replace(
                    regexp_replace('[{"reference":"73879547","type":"imageclozeformula"},{"reference":"466a5b88","type":"clozedropdown"}]', '\\]','')
                ,'\\}\\,\\{','\\}\\;\\{')
            ,'\\[','')
        ,'\\;')
    ) as json_str
)
select 
    reference,
    get_json_object(json_str,'$.reference') as questions_reference,
    get_json_object(json_str,'$.type') as questions_type
from 
    core;
+-------------+----------------------+--------------------+--+
|  reference  | questions_reference  |   questions_type   |
+-------------+----------------------+--------------------+--+
| DP001_1_10  | 73879547             | imageclozeformula  |
| DP001_1_10  | 466a5b88             | clozedropdown      |
+-------------+----------------------+--------------------+--+

您只需将示例字符串'DP001_1_10''[{"reference":"73879547","type":"imageclozeformula"},{"reference":"466a5b88","type":"clozedropdown"}]'替换为列名referencequestions。 因此,您想要的最终hql可能如下所示:

with core as (
select 
    reference,
    explode(
        split(
            regexp_replace(
                regexp_replace(
                    regexp_replace(questions, '\\]','')
                ,'\\}\\,\\{','\\}\\;\\{')
            ,'\\[','')
        ,'\\;')
    ) as json_str
from
    sandbox.exportitems
)
select 
    reference,
    get_json_object(json_str,'$.reference') as questions_reference,
    get_json_object(json_str,'$.type') as questions_type
from 
    core;