BigQuery标准SQL:如何按ARRAY字段分组

时间:2018-02-23 00:33:02

标签: arrays string group-by google-bigquery sql-standards

我的表格有两列,try { ... } catch(ParsingException e) // Will still catch IdNumberNONEParsingException { if (e is IdNumberNONEParsingException) // Checks if the exception that was thrown was an IdNumberNONEParsingException { // Special logic for handling IdNumberNONEParsingException } else { // Special logic for handling non-IdNumberNONEParsingExceptions } // Shared logic for handling all types of ParsingExceptions eg. logging, cleanup, etc. } id。列a包含一个数字,列id包含一个字符串数组。我想计算给定数组的唯一ID数,数组之间的相等性定义为“相同大小,每个索引的字符串相同”。

使用a时,我得到GROUP BY a。我可以使用像Grouping by expressions of type ARRAY is not allowed这样的东西,但是两个数组GROUP BY ARRAY_TO_STRING(a, ",")["a,b"]被组合在一起,我失去了数组的“真实”值(所以如果我想使用它稍后在另一个查询中,我必须拆分字符串)。

此字段数组中的值来自用户,因此我不能假设某些字符永远不会存在(并将其用作分隔符)。

1 个答案:

答案 0 :(得分:7)

而不是GROUP BY ARRAY_TO_STRING(a, ",")使用GROUP BY TO_JSON_STRING(a)

所以您的查询将如下所示

  
#standardsql
SELECT 
  TO_JSON_STRING(a) arr,
  COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY arr

您可以使用下面的虚拟数据进行测试

#standardsql
WITH `project.dataset.table` AS (
  SELECT 1 id, ["a,b", "c"] a UNION ALL
  SELECT 1, ["a","b,c"]
)
SELECT 
  TO_JSON_STRING(a) arr,
  COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY arr  

结果为

Row     arr             cnt  
1       ["a,b","c"]     1    
2       ["a","b,c"]     1    

根据@Ted的评论进行更新

#standardsql
SELECT 
  ANY_VALUE(a) a,
  COUNT(DISTINCT id) cnt
FROM `project.dataset.table`
GROUP BY TO_JSON_STRING(a)