从BiQuery数据JSON中的数组中提取索引值

时间:2020-02-26 10:37:23

标签: arrays json google-bigquery

我已将数据从Firestore导入到BigQuery。我的数据结构与此类似:

data = [
  {
    id: "item1",
    status: {
      options: [
        {
          title: "Approved",
          color: "#00ff00"
        },
        {
          title: "Rejected",
          color: "#ff0000"
        },
        {
          title: "Pending",
          color: "#ffaa00"
        }
      ],
      optionIndex: 0
    }
  },
  {
    id: "item2",
    status: {
      options: [
        {
          title: "Validated",
          color: "#00ff00"
        },
        {
          title: "Invalidated",
          color: "#ff0000"
        }
      ],
      optionIndex: 1
    }
  }
];

我成功运行查询以提取键值,例如id:

SELECT
  JSON_EXTRACT(data, '$.id') AS item_id,
  JSON_EXTRACT(data, '$.status.optionIndex') AS option_index
FROM `my_bigquery_table`

但是,我很难找到一个选择 status.options [status.options.optionIndex] 的解决方案,在那里我可以将状态的标题和颜色放入表格中。我一直追求的结果是:

id,status_title,status_color
item1,Approved,#00ff00
item2,Invalidated,#ffaa00

(我对包括连接在内的大多数基本SQL都做得很好,但是将状态数组放入可查询的结构中,可以在其中选择索引不在我的级别之内)

1 个答案:

答案 0 :(得分:2)

以下是BigQuery标准SQL

#standardSQL
CREATE TEMP FUNCTION json2array(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS '''
  return JSON.parse(input).map(x=>JSON.stringify(x));
'''; 
SELECT 
  JSON_EXTRACT_SCALAR(data, '$.id') AS id,
  JSON_EXTRACT_SCALAR(option, '$.title') AS status_title,
  JSON_EXTRACT_SCALAR(option, '$.color') AS status_color
FROM `project.dataset.my_bigquery_table`,
UNNEST([json2array(JSON_EXTRACT(data, '$.status.options'))[OFFSET(CAST(JSON_EXTRACT_SCALAR(data, '$.status.optionIndex') AS INT64))]]) option  

如果要应用于问题中的样本数据-输出为

Row id      status_title    status_color     
1   item1   Approved        #00ff00  
2   item2   Invalidated     #ff0000