如何使用标准SQL在BigQuery中旋转表格?

时间:2019-01-24 19:25:25

标签: google-bigquery

我正在尝试使用standardSQL在bigQuery中旋转表-这是我的输入表-

with temp as (
select "1" as id , "a" as source
union all
select "1" as id , "b" as source
union all
select "1" as id , "c" as source
union all
select "2" as id , "a" as source
union all
select "2" as id , "b" as source
union all
select "3" as id , "c" as source
union all
select "4" as id , "c" as source
)
select * from temp

我想基于source列来旋转此表,并根据派生源列的每种组合来计算records的数量。我只有3个来源-a,b and c

我的输出表应该是-

source_a, source_b, source_c, records
0,0,0, 0
0,0,1, 2
0,1,0, 0
0,1,1, 0
1,0,0, 0
1,0,1, 0
1,1,0, 1
1,1,1, 1

我尝试使用案例声明,但我认为它不起作用-

with temp as (
select "1" as id , "a" as source
union all
select "1" as id , "b" as source
union all
select "1" as id , "c" as source
union all
select "2" as id , "a" as source
union all
select "2" as id , "b" as source
union all
select "3" as id , "c" as source
union all
select "4" as id , "c" as source
)
select case when source = "a" then 1 else 0 end source_a,
case when source = "b" then 1 else 0 end source_b,
case when source = "c" then 1 else 0 end source_c,
count(*) as records
from temp
group by 1 ,2 ,3 

1 个答案:

答案 0 :(得分:1)

以下示例适用于BigQuery标准SQL

   
#standardSQL
WITH temp AS (
  SELECT "1" AS id , "a" AS source UNION ALL
  SELECT "1" AS id , "b" AS source UNION ALL
  SELECT "1" AS id , "c" AS source UNION ALL
  SELECT "2" AS id , "a" AS source UNION ALL
  SELECT "2" AS id , "b" AS source UNION ALL
  SELECT "3" AS id , "c" AS source UNION ALL
  SELECT "4" AS id , "c" AS source
), vals AS (
  SELECT 0 val UNION ALL SELECT 1
), combinations AS (
  SELECT v1.val source_a, v2.val source_b, v3.val source_c
  FROM vals v1
  CROSS JOIN vals v2
  CROSS JOIN vals v3
), facts AS (
  SELECT id,
    MAX(IF(source = 'a', 1, 0)) AS source_a,
    MAX(IF(source = 'b', 1, 0)) AS source_b,
    MAX(IF(source = 'c', 1, 0)) AS source_c
  FROM temp
  GROUP BY id
)
SELECT source_a, source_b, source_c, COUNT(id) records
FROM combinations
LEFT JOIN facts
USING (source_a, source_b, source_c)
GROUP BY source_a, source_b, source_c
ORDER BY source_a, source_b, source_c   

有结果

Row     source_a    source_b    source_c    records  
1       0           0           0           0    
2       0           0           1           2    
3       0           1           0           0    
4       0           1           1           0    
5       1           0           0           0    
6       1           0           1           0    
7       1           1           0           1    
8       1           1           1           1