BigQuery中的json对象的嵌套字符串数组

时间:2019-07-19 18:19:24

标签: google-bigquery

我有一个表,其中包含一个string列,其中包含JSON对象的字符串化列表,如下所示:

'[{"a": 5, "b": 6}, {"a": 7, "b": 8}]'

我想取消嵌套此数组,然后使用json_extract()json_extract_scalar()从这些对象中获取值。

BigQuery's JSON Function documentation尚不清楚我是否可以使用内置功能来做到这一点。

是否需要UDF才能做到这一点,或者BigQuery中是否存在此功能?

下面的UDF可以满足我的需求:

CREATE TEMP FUNCTION
  JSON_EXTRACT_ARRAY(input STRING)
  RETURNS ARRAY<STRING>
  LANGUAGE js AS """  
return JSON.parse(input).map(x => JSON.stringify(x));
""";

with

raw as (
  select
    1 as id,
    '[{"a": 5, "b": 6}, {"a": 7, "b": 8}]' as body
)

select
  id,
  json_extract(entry, '$.a') as a,
  json_extract(entry, '$.b') as b
from
  raw,
  unnest(json_extract_array(body)) as entry

3 个答案:

答案 0 :(得分:0)

尝试这样的事情


with

raw as (
    select
        1 as id,
        '[{"a": 5, "b": 6}, {"a": 7, "b": 8}]' as body
)

select
    r.id,
    r.body,
    regexp_extract_all(r.body, r'({.*?})'),
    json_extract(entry, '$.a') as a,
    json_extract(entry, '$.b') as b
from
    raw as r
    cross join  unnest(
                    regexp_extract_all(r.body, r'({.*?})')
                ) as entry

答案 1 :(得分:0)

或更一般的解决方案

with

raw as (
    select
        1 as id,
        '[{"a": 5, "b": {"x": 1, "y": 2}}, {"b": {"c": 5, "d": 8}, "a": 7}]' as body
)

select
    r.id,
    r.body,
    split(trim(r.body, '[]{}'), '}, {'),
    json_extract(concat('{', entry, '}'), '$.a') as a,
    json_extract(concat('{', entry, '}'), '$.b') as b
from
    raw as r
    cross join  unnest(
                    split(trim(r.body, '[]{}'), '}, {')
                ) as entry

答案 2 :(得分:0)

Google 已将函数 JSON_EXTRACT_ARRAY 添加到其标准 SQL 中,因此现在无需 UDF 即可完成此操作。事实上,由于 OP 中的 UDF 名称是相同的名称 (JSON_EXTRACT_ARRAY),您可以按原样在 UDF 下运行该查询,它会起作用。

如果性能很重要,您还可以通过将正文数据提取到重复记录中来利用 BigQuery's nesting capabilities,而不是完全非规范化表。

with 
    raw as (
        select
            1 as id,
            '[{"a": 5, "b": 6}, {"a": 7, "b": 8}]' as body
    )

select
    r.id,
    array(
        select
            struct (
                json_value(items, '$.a') as a,
                json_value(items, '$.b') as b
            ) as b 
        from unnest(json_extract_array(body, '$')) as items
    ) as body_record_repeated
from raw r

哪个会返回

BigQuery repeated record result