将数组转换为BigQuery中的嵌套字段

时间:2019-02-25 13:04:25

标签: google-bigquery

我正在将Stackdriver流式传输到Bigquery中,它们最终以以下格式出现在textPayload字段中:


member_id_hashed=123456789,

member_age -> Float(37.0,244),

operations=[92967,93486,86220,92814,92943,93279,...],

scores=[3.214899,2.3641025E-5,2.5823574,2.3818345,3.9919448,0.0,...],

[etc]

然后我在表上定义带有原始日志记录条目的查询/视图,如下所示:

SELECT
member_id_hashed as member_id, member_age,
split(operations,',') as operation,
split(scores,',') as score 
FROM
(
  SELECT
  REGEXP_EXTRACT(textPayload, r'member_id=([0-9]+)') as member_id_hashed,
  REGEXP_EXTRACT(textPayload, r'member_age -> Float\(([0-9]+)') as member_age,
  REGEXP_EXTRACT(textPayload, r'operations=\[(.+)') as operations,
  REGEXP_EXTRACT(textPayload, r'scores=\[(.+)') as scores
  from `myproject.mydataset.mytable`
)

一行包含两个单个字段和两个数组:

enter image description here

理想地,为了进一步分析,我希望将两个数组嵌套(例如operation.id和operation.score)或逐行展平数组,同时保持位置不变(即数组1的第1行应出现在旁边)数组2的第1行,依此类推):

enter image description here

有人能指出我一种从数组中制作嵌套字段或展平数组的方法吗?我尝试取消嵌套和加入,但这会导致结果中出现太多可能的交叉组合。

感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

您可以像这样压缩两个数组。它取消嵌套具有操作ID的数组,并获取每个元素的索引,然后选择具有分数的数组的相应元素。请注意,这假设数组具有相同数量的元素。如果没有,则可以使用SAFE_OFFSET而不是OFFSET来获得NULL,例如,如果ID比分数多。

SELECT
  member_id_hashed, member_age,
  ARRAY(
    SELECT AS STRUCT id, split(scores,',')[OFFSET(off)] AS score
    FROM UNNEST(split(operations,',')) AS id WITH OFFSET off
    ORDER BY off
  ) AS operations
FROM (
  SELECT
    REGEXP_EXTRACT(textPayload, r'member_id=([0-9]+)') as member_id,
    REGEXP_EXTRACT(textPayload, r'member_age -> Float\(([0-9]+)') as member_age,
    REGEXP_EXTRACT(textPayload, r'operations=\[(.+)') as operations,
    REGEXP_EXTRACT(textPayload, r'scores=\[(.+)') as scores
  from `myproject.mydataset.mytable`
)