展平BigQuery表中的多个相同大小的数组列

时间:2019-11-05 10:47:30

标签: sql google-bigquery flatten unnest

我有一个包含几列的表,其中有些是长度相同的数组。我想取消嵌套以从单独行中的数组中获取值。

所以有这样的桌子:

input table

我想去:

output table

这是这些数组列之一的工作方式:

WITH data AS
(
  SELECT 1001 as id, ['a', 'b', 'c'] as array_1, [1, 2, 3] as array_2
  UNION ALL
  SELECT 1002 as id, ['d', 'e', 'f', 'g'] as array_1, [4, 5, 6, 7] as array_2
  UNION ALL
  SELECT 1003 as id, ['h', 'i'] as array_1, [8, 9] as array_2
)
SELECT id, a1
FROM data,
UNNEST(array_1) as a1

有没有一些优雅的方法可以同时取消两个数组的嵌套?我想避免分别取消嵌套每个列,然后将所有列连接在一起。

3 个答案:

答案 0 :(得分:1)

您可以使用with offsetjoin

WITH data AS
(
  SELECT 1001 as id, ['a', 'b', 'c'] as array_1, [1, 2, 3] as array_2
  UNION ALL
  SELECT 1002 as id, ['d', 'e', 'f', 'g'] as array_1, [4, 5, 6, 7] as array_2
  UNION ALL
  SELECT 1003 as id, ['h', 'i'] as array_1, [8, 9] as array_2
)
SELECT id, a1, a2
FROM data cross join
     UNNEST(array_1) as a1 with offset n1 JOIN
     UNNEST(array_2) as a2 with offset n2 
     on n1 = n2

答案 1 :(得分:1)

以下是用于BigQuery标准SQL

int main(){
    int t[] = {7,9,6,4,2};
    int min, max;
    search_extremes_rec(t, sizeof(t)/sizeof(t[0]), &min, &max);
    printf("min: %d, max: %d", min, max);
    return 0;
}
void search_extremes_rec(const int t[], int n, int *min, int *max){
    if(n<=1){
        *min = t[0];
        *max = t[0];
    }else{
        search_extremes_rec(t, n-1, min, max);
        if(*min > t[n-1]){
            *min = t[n-1];
        }   
        else if(*max < t[n-1]){
            *max = t[n-1];
        }
    }
}

答案 2 :(得分:0)

因此,我对自己取消嵌套SQL进行了一些研究,并提出了以下解决方案:

WITH data AS
(
  SELECT 1001 as id, ['a', 'b', 'c'] as array_1, [1, 2, 3] as array_2
  UNION ALL
  SELECT 1002 as id, ['d', 'e', 'f', 'g'] as array_1, [4, 5, 6, 7] as array_2
  UNION ALL
  SELECT 1003 as id, ['h', 'i'] as array_1, [8, 9] as array_2
)
SELECT id, a1, array_2[OFFSET(off)] AS a2
FROM data
CROSS JOIN UNNEST(array_1) AS a1 WITH OFFSET off

优点是不需要嵌套所有数组,只需嵌套一个。