BigQuery:使用标准SQL过滤重复的字段

时间:2016-12-06 16:52:43

标签: google-bigquery

我有下表:

row | query_params | query_values
1     foo            bar  
      param          val
2     foo            baz 

JSON:

{ 
"query_params" : [ "foo", "param"], 
"query_values" : [ "bar", "val" ] 
}, { 
"query_params" : [ "foo" ], 
"query_values" : [ "baz" ] 
}

使用标准SQL我想在其值上过滤重复字段,如

SELECT * FROM table WHERE query_params = 'foo'

哪个会输出

row | query_params | query_values
1     foo            bar  
2     foo            baz       

PS:对于使用旧版SQL的同一问题,请参阅here

1 个答案:

答案 0 :(得分:4)

您是否在differences in filtering repeated fields的迁移指南中看到了该主题?使用样本数据作为基础,并假设参数和值重复在一起(而不是单独的数组),您可以编写如下查询:

@Transactional

这里的重要部分是WITH T AS ( SELECT 1 AS row, ARRAY<STRUCT<param STRING, value STRING>>[ ('foo', 'bar'), ('param', 'val')] AS queries UNION ALL SELECT 2, ARRAY<STRUCT<param STRING, value STRING>>[('foo', 'baz')] ) SELECT * EXCEPT (queries) FROM T, UNNEST(queries) WHERE param = 'foo'; ,之间的T,它取UNNEST(queries)行和T中元素的叉积}。这相当于使用queriesJOIN代替逗号。

该查询还使用CROSS JOIN来避免在查询结果中选择原始数组,因为我们只希望&#34;展平&#34;数组的内容。

编辑:另一个示例查询,这次params和值独立重复:

EXCEPT (queries)

这使用WITH T AS ( SELECT 1 AS row, ['foo', 'param'] AS query_params, ['bar', 'val'] AS query_values UNION ALL SELECT 2, ['foo'], ['baz'] ) SELECT row, query_param, query_values[OFFSET(o)] AS query_value FROM T, UNNEST(query_params) AS query_param WITH OFFSET o WHERE query_param = 'foo'; 内的偏移量并行地归入query_params