汇总重复字段的唯一值

时间:2018-07-30 21:56:18

标签: sql google-cloud-platform google-bigquery standard-sql

我的桌子上有一个重复的字段。我想运行一个聚合查询并接收与我的查询匹配的唯一值数组。我已经尝试过此查询的几种变体:

with t as (select * from unnest([
    STRUCT("foo" as name, ["red", "blue"] as color)
  , STRUCT("foo", ["blue"])
  , STRUCT("foo", NULL)
  , STRUCT("foo", ["green"])
  , STRUCT("bar", ["orange", "black"])
  , STRUCT("bar", ["black", "white"])
]))
select
    (select color from unnest(array_concat_agg(color))) as color
from t
group by name

所需的结果是:

name  | color
=====================================
foo   | ["red", "blue", "green"]
bar   | ["orange", "black", "white"]

此特定查询给出了Aggregate function ARRAY_CONCAT_AGG not allowed in UNNEST at [10:31],但我在文档中找不到此错误,也找不到直觉的理由说明为什么会有这样的限制,也找不到如何解决此错误的方法

我正在做的事情固有地需要额外级别的嵌套查询吗?

2 个答案:

答案 0 :(得分:3)

以下是用于BigQuery标准SQL

#standardSQL
SELECT name, ARRAY_AGG(DISTINCT color) color
FROM `project.dataset.your_table`, UNNEST(color) color
GROUP BY name

您可以使用问题中的伪数据作为

进行测试,操作
#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT * FROM UNNEST([
    STRUCT("foo" AS name, ["red", "blue"] AS color)
  , STRUCT("foo", ["blue"])
  , STRUCT("foo", NULL)
  , STRUCT("foo", ["green"])
  , STRUCT("bar", ["orange", "black"])
  , STRUCT("bar", ["black", "white"])
]))
SELECT name, ARRAY_AGG(DISTINCT color) color
FROM `project.dataset.your_table`, UNNEST(color) color
GROUP BY name

结果为

Row name    color    
1   bar     orange   
            white    
            black    
2   foo     red  
            blue     
            green    

答案 1 :(得分:0)

这将获得您想要的结果:

select t.name, array_agg(distinct color)
from (select name, array_concat_agg( color) as colors
      from t
      group by name
     ) t cross join
     unnest(colors) color
group by t.name;

您的查询有几个问题,特别是带有unnest()的子查询将返回多个行。