我有一个表,其中包含一个string
列,其中包含JSON对象的字符串化列表,如下所示:
'[{"a": 5, "b": 6}, {"a": 7, "b": 8}]'
我想取消嵌套此数组,然后使用json_extract()
或json_extract_scalar()
从这些对象中获取值。
从BigQuery's JSON Function documentation尚不清楚我是否可以使用内置功能来做到这一点。
是否需要UDF才能做到这一点,或者BigQuery中是否存在此功能?
下面的UDF可以满足我的需求:
CREATE TEMP FUNCTION
JSON_EXTRACT_ARRAY(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(x => JSON.stringify(x));
""";
with
raw as (
select
1 as id,
'[{"a": 5, "b": 6}, {"a": 7, "b": 8}]' as body
)
select
id,
json_extract(entry, '$.a') as a,
json_extract(entry, '$.b') as b
from
raw,
unnest(json_extract_array(body)) as entry
答案 0 :(得分:0)
尝试这样的事情
with
raw as (
select
1 as id,
'[{"a": 5, "b": 6}, {"a": 7, "b": 8}]' as body
)
select
r.id,
r.body,
regexp_extract_all(r.body, r'({.*?})'),
json_extract(entry, '$.a') as a,
json_extract(entry, '$.b') as b
from
raw as r
cross join unnest(
regexp_extract_all(r.body, r'({.*?})')
) as entry
答案 1 :(得分:0)
或更一般的解决方案
with
raw as (
select
1 as id,
'[{"a": 5, "b": {"x": 1, "y": 2}}, {"b": {"c": 5, "d": 8}, "a": 7}]' as body
)
select
r.id,
r.body,
split(trim(r.body, '[]{}'), '}, {'),
json_extract(concat('{', entry, '}'), '$.a') as a,
json_extract(concat('{', entry, '}'), '$.b') as b
from
raw as r
cross join unnest(
split(trim(r.body, '[]{}'), '}, {')
) as entry
答案 2 :(得分:0)
Google 已将函数 JSON_EXTRACT_ARRAY 添加到其标准 SQL 中,因此现在无需 UDF 即可完成此操作。事实上,由于 OP 中的 UDF 名称是相同的名称 (JSON_EXTRACT_ARRAY),您可以按原样在 UDF 下运行该查询,它会起作用。
如果性能很重要,您还可以通过将正文数据提取到重复记录中来利用 BigQuery's nesting capabilities,而不是完全非规范化表。
with
raw as (
select
1 as id,
'[{"a": 5, "b": 6}, {"a": 7, "b": 8}]' as body
)
select
r.id,
array(
select
struct (
json_value(items, '$.a') as a,
json_value(items, '$.b') as b
) as b
from unnest(json_extract_array(body, '$')) as items
) as body_record_repeated
from raw r
哪个会返回