将关系应用于标准SQL中的数组

时间:2019-02-08 22:32:50

标签: sql google-bigquery standard-sql

我有两个大查询表。表1具有模式{id:String, colors:Array[String]},看起来像

| id   | colors                      |
|------|-----------------------------|
| id_1 | ["blue", "green", "orange"] |
| id_2 | ["red" , "blue", "green" ]  |
| ...  | ....                        |

和表2将颜色与具有模式{color:String, number:Int}的数字相关联,看起来像

| color | number |
|-------|--------|
| "blue"| 0      |
| "red" | 1      |
| ...   | ...    |

我想生成一张看起来像这样的表

| id | numbers |
|----|---------|
|id_1| [0,3,4] |
|id_2| [1,0,3] |
| ...|...      |
通过将表1中的每种颜色映射到其对应的数字获得

。我唯一能想到的解决方案是

SELECT id, ARRAY_AGG(number) AS numbers
FROM (table_1 CROSS JOIN UNNEST(table_1.colors) as color) JOIN table_2 USING(color) 
GROUP BY email

但这会花费超长的时间(可能是交叉联接的cuz)

3 个答案:

答案 0 :(得分:0)

您也可以这样表达:

SELECT email,
       (SELECT ARRAY_AGG(number) AS numbers
        FROM UNNEST(table_1.colors) color JOIN 
             table_2
        USING (color) 
       ) as colors
FROM table_1;

我不确定每行的“本地”聚合是否比BigQuery中的“整体”聚合更好。但这值得一试。

答案 1 :(得分:0)

以下是用于BigQuery标准SQL

#standardSQL
SELECT id,
  ARRAY(
    SELECT number FROM table_1.colors color 
    JOIN `project.dataset.table_2` USING (color) 
  ) AS numbers
FROM `project.dataset.table_1` table_1   

您可以使用问题中的示例数据来进行测试,如上示例所示

#standardSQL
WITH `project.dataset.table_1` AS (
  SELECT 'id_1' id, ["blue", "green", "orange"] colors UNION ALL
  SELECT 'id_2', ["red" , "blue", "green" ] 
), `project.dataset.table_2` AS (
  SELECT 'blue' color, 0 number UNION ALL
  SELECT 'red', 1 UNION ALL
  SELECT 'green', 3 UNION ALL
  SELECT 'orange', 4
)
SELECT id,
  ARRAY(
    SELECT number FROM table_1.colors color 
    JOIN `project.dataset.table_2` USING (color) 
  ) AS numbers
FROM `project.dataset.table_1` table_1   

有结果

enter image description here

答案 2 :(得分:0)

像这样简单的事情

select id, array_agg(number) as numbers from (
  select id, c, t2.number from table_1 t1, unnest(t1.colors) c
  join table_2 t2 on c = t2.color
)
group by 1