BigQuery-简化为字段内的唯一记录

时间:2019-01-28 09:53:59

标签: google-bigquery unique distinct

我有一个带有如下字段的表:

ID    Field 1           Field 2
1     22,34,05,44,44    01,02,02,03
2     11,01,05          02,02,01,01,22

我该如何在BigQuery(strandardSQL)中将其转换为仅显示唯一记录并从大到小排序?

这样输出将如下所示:

ID    Field 1           Field 2
1     05,22,34,44       01,02,03
2     01,05,11          01,02,22

我尝试使用Split,但随后却运行了数百个重复项,而且window函数也不允许distinct稍后再将它们组合在一起。

请帮助弄清楚

1 个答案:

答案 0 :(得分:1)

您可以将字符串拆分成数组,然后使用DISTINCT进行重复数据删除并使用ORDER BY进行排序:

SELECT
  ID,
  ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field1, ',')) AS x ORDER BY x) AS field1,
  ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field2, ',')) AS x ORDER BY x) AS field2
FROM `project-name`.dataset.table

如果要再次将数组转换为逗号分隔的字符串,可以使用ARRAY_TO_STRING函数:

SELECT
  ID,
  ARRAY_TO_STRING(ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field1, ',')) AS x ORDER BY x), ',') AS field1,
  ARRAY_TO_STRING(ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field2, ',')) AS x ORDER BY x), ',') AS field2
FROM `project-name`.dataset.table