我正在尝试一个非常基本的配置单元查询。我试图从数据集中提取json字段,但我总是得到
\ N
对于json字段,但是some_string可以接受
这是我的查询:
WITH dataset AS (
SELECT
CAST(
'{ "traceId": "abc", "additionalData": "{\"Star Rating\":\"3\"}", "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
}' AS STRING) AS some_string
)
SELECT some_string, get_json_object(dataset.some_string, '$.traceId') FROM dataset
问题:如何在此处获取json字段?
答案 0 :(得分:1)
问题出在反斜杠中。单个反斜杠被视为“的转义字符,并被Hive删除:
hive> select '\"';
OK
"
Time taken: 0.069 seconds, Fetched: 1 row(s)
当您有两个反斜杠时,Hive会删除一个反斜杠:
hive> select '\\"';
OK
\"
Time taken: 0.061 seconds, Fetched: 1 row(s)
使用两个反斜杠可以正常工作:
WITH dataset AS (
SELECT
CAST(
'{ "traceId": "abc", "additionalData": "{\\"Star Rating\\":\\"3\\"}", "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
}' AS STRING) AS some_string
)
SELECT some_string, get_json_object(dataset.some_string, '$.traceId') FROM dataset;
OK
{ "traceId": "abc", "additionalData": "{\"Star Rating\":\"3\"}", "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
} abc
Time taken: 0.788 seconds, Fetched: 1 row(s)
您还可以轻松地在AdditionalData中删除{之前和之后的双引号:
WITH dataset AS (
SELECT
regexp_replace(regexp_replace(
'{ "traceId": "abc", "additionalData": "{\"Star Rating\":\"3\"}", "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
}' ,'\\"\\{','\\{') ,'\\}\\"','\\}' )AS some_string
)
SELECT some_string, get_json_object(dataset.some_string, '$.traceId') FROM dataset;
返回:
OK
{ "traceId": "abc", "additionalData": {"Star Rating":"3"}, "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
} abc
Time taken: 7.035 seconds, Fetched: 1 row(s)