碰巧我在BigQuery的字段中有一个字符串化数组
'["a","b","c"]'
我想将它转换为BigQuery理解的数组。 我希望能够在标准SQL中执行此操作:
with k as (select '["a","b","c"]' as x)
select x from k, unnest(x) x
我尝试了JSON_EXTRACT('["a","b","c"]','$')
以及我可以在网上找到的其他所有内容。
有什么想法吗?
答案 0 :(得分:12)
以下是BigQuery Standard SQL
#standardSQL
WITH k AS (
SELECT 1 AS id, '["a","b","c"]' AS x UNION ALL
SELECT 2, '["x","y"]'
)
SELECT
id,
ARRAY(SELECT * FROM UNNEST(SPLIT(SUBSTR(x, 2 , LENGTH(x) - 2)))) AS x
FROM k
它将您的字符串列转换为数组列
答案 1 :(得分:1)
我想提供一个替代方案。由于数组是字符串,因此只需使用regexp_extract_all提取值即可:
REGEXP_EXTRACT_ALL(your_string, r'[0-9a-zA-Z][^"]+') as arr
您可能会发现正则表达式的限制过于严格,无法以字母数字开头;您可以根据自己的喜好对其进行调整。
答案 2 :(得分:1)
通过JS
UDF会容易得多。
CREATE TEMP FUNCTION
JSON_EXTRACT_ARRAY(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input);
""";
WITH
k AS (
SELECT
'["a","b","c"]' AS x)
SELECT
JSON_EXTRACT_ARRAY(x) AS x
FROM
k
答案 3 :(得分:1)
最近(2020年),JSON_EXTRACT_ARRAY
函数已添加到bigquery标准sql中。
它很容易获得预期的行为,而没有UDF或花招
with k as (select JSON_EXTRACT_ARRAY('["a","b","c"]', '$') as x)
select unnested_x from k, unnest(x) unnested_x
将导致:
╔══════════════╗
║ "unnested_x" ║
╠══════════════╣
║ "a" ║
║ "b" ║
║ "c" ║
╚══════════════╝
答案 4 :(得分:0)
此解决方案更新了@northtree的答案,并且更优雅地处理了将数组成员作为字符串化的JSON对象返回,而不是返回[object Object]
字符串:
CREATE TEMP FUNCTION
JSON_EXTRACT_ARRAY(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(input).map(x => JSON.stringify(x));
""";
with
raw as (
select
1 as id,
'[{"a": 5, "b": 6}, {"a": 7}, 456]' as body
)
select
id,
entry,
json_extract(entry, '$'),
json_extract(entry, '$.a'),
json_extract(entry, '$.b')
from
raw,
unnest(json_extract_array(body)) as entry