我正在将Stackdriver流式传输到Bigquery中,它们最终以以下格式出现在textPayload
字段中:
member_id_hashed=123456789,
member_age -> Float(37.0,244),
operations=[92967,93486,86220,92814,92943,93279,...],
scores=[3.214899,2.3641025E-5,2.5823574,2.3818345,3.9919448,0.0,...],
[etc]
然后我在表上定义带有原始日志记录条目的查询/视图,如下所示:
SELECT
member_id_hashed as member_id, member_age,
split(operations,',') as operation,
split(scores,',') as score
FROM
(
SELECT
REGEXP_EXTRACT(textPayload, r'member_id=([0-9]+)') as member_id_hashed,
REGEXP_EXTRACT(textPayload, r'member_age -> Float\(([0-9]+)') as member_age,
REGEXP_EXTRACT(textPayload, r'operations=\[(.+)') as operations,
REGEXP_EXTRACT(textPayload, r'scores=\[(.+)') as scores
from `myproject.mydataset.mytable`
)
一行包含两个单个字段和两个数组:
理想地,为了进一步分析,我希望将两个数组嵌套(例如operation.id和operation.score)或逐行展平数组,同时保持位置不变(即数组1的第1行应出现在旁边)数组2的第1行,依此类推):
有人能指出我一种从数组中制作嵌套字段或展平数组的方法吗?我尝试取消嵌套和加入,但这会导致结果中出现太多可能的交叉组合。
感谢您的帮助!
答案 0 :(得分:1)
您可以像这样压缩两个数组。它取消嵌套具有操作ID的数组,并获取每个元素的索引,然后选择具有分数的数组的相应元素。请注意,这假设数组具有相同数量的元素。如果没有,则可以使用SAFE_OFFSET
而不是OFFSET
来获得NULL,例如,如果ID比分数多。
SELECT
member_id_hashed, member_age,
ARRAY(
SELECT AS STRUCT id, split(scores,',')[OFFSET(off)] AS score
FROM UNNEST(split(operations,',')) AS id WITH OFFSET off
ORDER BY off
) AS operations
FROM (
SELECT
REGEXP_EXTRACT(textPayload, r'member_id=([0-9]+)') as member_id,
REGEXP_EXTRACT(textPayload, r'member_age -> Float\(([0-9]+)') as member_age,
REGEXP_EXTRACT(textPayload, r'operations=\[(.+)') as operations,
REGEXP_EXTRACT(textPayload, r'scores=\[(.+)') as scores
from `myproject.mydataset.mytable`
)