我正在尝试通过Hive中的get_json_object将具有唯一问题引用的行分隔为几行,我可以这样吗?
我只是触摸Hive中的get_json_object,尝试使用唯一的问题引用(来自一个json格式列)将一行分隔为几行,并包含其他列信息。
SELECT reference AS item_reference,
get_json_object(questions, '$.reference') AS question_reference,
get_json_object(questions, '$.type') AS question_type
from sandbox.exportitems limit 10;
例如,给定输入:
reference | questions
NP002_025 | [{"reference":"3dfc54c0","type":"clozeformula"}]
DP001_1_10 | [{"reference":"73879547","type":"imageclozeformula"},
{"reference":"466a5b88","type":"clozedropdown"}]
预期输出为:
reference | questions_reference | questions_type
NP002_025 | 3dfc54c0 | clozeformula
DP001_1_10 | 73879547 | imageclozeformula
DP001_1_10 | 466a5b88 | clozedropdown
答案 0 :(得分:0)
好的,如下所示:
with core as (
SELECT
'DP001_1_10' as reference,
explode(
split(
regexp_replace(
regexp_replace(
regexp_replace('[{"reference":"73879547","type":"imageclozeformula"},{"reference":"466a5b88","type":"clozedropdown"}]', '\\]','')
,'\\}\\,\\{','\\}\\;\\{')
,'\\[','')
,'\\;')
) as json_str
)
select
reference,
get_json_object(json_str,'$.reference') as questions_reference,
get_json_object(json_str,'$.type') as questions_type
from
core;
+-------------+----------------------+--------------------+--+
| reference | questions_reference | questions_type |
+-------------+----------------------+--------------------+--+
| DP001_1_10 | 73879547 | imageclozeformula |
| DP001_1_10 | 466a5b88 | clozedropdown |
+-------------+----------------------+--------------------+--+
您只需将示例字符串'DP001_1_10'
和'[{"reference":"73879547","type":"imageclozeformula"},{"reference":"466a5b88","type":"clozedropdown"}]'
替换为列名reference
和questions
。
因此,您想要的最终hql可能如下所示:
with core as (
select
reference,
explode(
split(
regexp_replace(
regexp_replace(
regexp_replace(questions, '\\]','')
,'\\}\\,\\{','\\}\\;\\{')
,'\\[','')
,'\\;')
) as json_str
from
sandbox.exportitems
)
select
reference,
get_json_object(json_str,'$.reference') as questions_reference,
get_json_object(json_str,'$.type') as questions_type
from
core;