如何在BigQuery中将字符串化数组转换为数组?

时间:2017-09-13 14:11:22

标签: google-bigquery standard-sql

碰巧我在BigQuery的字段中有一个字符串化数组

'["a","b","c"]'

我想将它转换为BigQuery理解的数组。 我希望能够在标准SQL中执行此操作:

with k as (select '["a","b","c"]' as x)
select x from k, unnest(x) x

我尝试了JSON_EXTRACT('["a","b","c"]','$')以及我可以在网上找到的其他所有内容。

有什么想法吗?

5 个答案:

答案 0 :(得分:12)

以下是BigQuery Standard SQL

   
#standardSQL
WITH k AS (
  SELECT 1 AS id, '["a","b","c"]' AS x UNION ALL
  SELECT 2, '["x","y"]' 
)
SELECT 
  id, 
  ARRAY(SELECT * FROM UNNEST(SPLIT(SUBSTR(x, 2 , LENGTH(x) - 2)))) AS x
FROM k

它将您的字符串列转换为数组列

答案 1 :(得分:1)

我想提供一个替代方案。由于数组是字符串,因此只需使用regexp_extract_all提取值即可:

REGEXP_EXTRACT_ALL(your_string, r'[0-9a-zA-Z][^"]+') as arr

您可能会发现正则表达式的限制过于严格,无法以字母数字开头;您可以根据自己的喜好对其进行调整。

答案 2 :(得分:1)

通过JS UDF会容易得多。

CREATE TEMP FUNCTION
  JSON_EXTRACT_ARRAY(input STRING)
  RETURNS ARRAY<STRING>
  LANGUAGE js AS """  
return JSON.parse(input);
""";
WITH
  k AS (
  SELECT
    '["a","b","c"]' AS x)
SELECT
  JSON_EXTRACT_ARRAY(x) AS x
FROM
  k

答案 3 :(得分:1)

最近(2020年),JSON_EXTRACT_ARRAY函数已添加到bigquery标准sql中。

它很容易获得预期的行为,而没有UDF或花招

with k as (select JSON_EXTRACT_ARRAY('["a","b","c"]', '$') as x)
select unnested_x from k, unnest(x) unnested_x

将导致:

╔══════════════╗
║ "unnested_x" ║
╠══════════════╣
║     "a"      ║
║     "b"      ║
║     "c"      ║
╚══════════════╝

JSON_EXTRACT_ARRAY doc

答案 4 :(得分:0)

此解决方案更新了@northtree的答案,并且更优雅地处理了将数组成员作为字符串化的JSON对象返回,而不是返回[object Object]字符串:

CREATE TEMP FUNCTION
  JSON_EXTRACT_ARRAY(input STRING)
  RETURNS ARRAY<STRING>
  LANGUAGE js AS """  
return JSON.parse(input).map(x => JSON.stringify(x));
""";

with

raw as (
  select
    1 as id,
    '[{"a": 5, "b": 6}, {"a": 7}, 456]' as body
)

select
  id,
  entry,
  json_extract(entry, '$'),
  json_extract(entry, '$.a'),
  json_extract(entry, '$.b')
from
  raw,
  unnest(json_extract_array(body)) as entry