json_extract_scalar在键名中以括号失败

时间:2016-11-21 08:23:59

标签: google-bigquery jsonpath

我有一个字符串字段,其中包含从服务器发送的原始JSON数据。但是,该键包含括号,在尝试提取数据时似乎会导致问题。

数据样本:

{"Interview (Onsite)": "2015-04-06 16:58:28"}

提取尝试:

timestamp(max(json_extract_scalar(a.status_history, '$.Interview (Onsite)')))

(' max'函数用作status_history是重复字段)

错误:

JSONPath parse error at: (Onsite)

我尝试了多种逃避括号的常用方法,但它让我无处可去。

会欣赏有关如何规避的建议 - 除非我真的需要,否则我宁愿不诉诸正则表达式。

1 个答案:

答案 0 :(得分:0)

启用standard SQL(取消选中"使用旧版SQL""显示选项"在UI中)后,您可以使用带引号的字符串作为JSON路径的一部分。例如:

SELECT
  CAST(JSON_EXTRACT_SCALAR(
    '{"Interview (Onsite)": "2015-04-06 16:58:28"}',
    "$['Interview (Onsite)']") AS TIMESTAMP) AS t;

修改:由于您的列是ARRAY<STRING>,因此您需要使用ARRAY子查询将JSON_EXTRACT_SCALAR应用于每个元素。例如:

WITH T AS (
  SELECT
    ['{"Interview (Onsite)": "2015-04-06 16:58:28"}',
     '{"Interview (Onsite)": "2015-11-16 08:09:10"}',
     '{"Interview (Onsite)": "2016-01-01 18:12:43"}']
     AS status_history UNION ALL
  SELECT
    ['{"Interview (Onsite)": "2016-06-25 07:01:45"}']
)
SELECT
  ARRAY (
    SELECT CAST(JSON_EXTRACT_SCALAR(history, "$['Interview (Onsite)']") AS TIMESTAMP)
    FROM UNNEST(status_history) AS history
  ) AS interview_times
FROM T;

或者,如果您不关心保留阵列的结构,您可以“平坦化”#34;它有一个连接,它将为status_history的每个元素返回一行:

WITH T AS (
  SELECT
    ['{"Interview (Onsite)": "2015-04-06 16:58:28"}',
     '{"Interview (Onsite)": "2015-11-16 08:09:10"}',
     '{"Interview (Onsite)": "2016-01-01 18:12:43"}']
     AS status_history UNION ALL
  SELECT
    ['{"Interview (Onsite)": "2016-06-25 07:01:45"}']
)
SELECT
  CAST(JSON_EXTRACT_SCALAR(history, "$['Interview (Onsite)']") AS TIMESTAMP)
    AS interview_time
FROM T CROSS JOIN UNNEST(status_history) AS history;

另请参阅section of the migration guide on handling of repeated fields