用于汇总各个列的Google BigQuery SQL

时间:2013-10-14 01:03:28

标签: sql google-bigquery

我正在寻找一个有效的(ish)BigQuery SQL查询来解决以下问题:

我有一张看起来像这样的表:


    
    Row | Col_A | Col_B |
    ---------------------
     1  |   2   |   3   |
     2  |   1   |   4   |
     3  |   5   |   7   |
     4  |   2   |   3   |
     5  |   6   |   1   |

    ...and so on (>million rows)
    

每列的值是一个范围为[1..7]的ID。

查询应该产生以下内容,即对每列的每个代码求和:


    
    Code | Total Col_A | Total Col_B
    --------------------------------
      1  |      1      |      0  
      2  |      2      |      0  
      3  |      0      |      2  
      4  |      0      |      1  
      5  |      1      |      0  
      6  |      1      |      0  
      7  |      0      |      1
    

任何人都知道在不使用多个SELECT的情况下在BigQuery中执行此操作的方法吗?

干杯。

2 个答案:

答案 0 :(得分:2)

您可以使用样本数据创建公共数据集吗?编写适用于您的数据的查询并验证结果会更容易。

一个起始查询:

SELECT Code, COUNT(Col_A) count_column_x, COUNT(Col_B) count_column_y
FROM [your:list.of_codes] a
LEFT JOIN EACH [your:sample.table] b
ON a.Code=b.Col_A
GROUP BY 1

(这并不完美,如果你共用一张桌子可以继续使用)

答案 1 :(得分:1)

  

任何人都知道在不使用多个SELECT的情况下在BigQuery中执行此操作的方法吗?

使用标准SQL的一个SELECT

#standardSQL
WITH logs AS (
  SELECT 2 AS Col_A, 3 AS Col_B UNION ALL
  SELECT 1 AS Col_A, 4 AS Col_B UNION ALL
  SELECT 5 AS Col_A, 7 AS Col_B UNION ALL
  SELECT 2 AS Col_A, 3 AS Col_B UNION ALL
  SELECT 6 AS Col_A, 1 AS Col_B   
)
SELECT 
  id, 
  SUM(CAST(id = Col_A AS INT64)) AS Total_Col_A, 
  SUM(CAST(id = Col_B AS INT64)) AS Total_Col_B
FROM logs, UNNEST(GENERATE_ARRAY(1,7)) AS id
GROUP BY id
ORDER BY id

COUNTIF()

#standardSQL
WITH logs AS (
  SELECT 2 AS Col_A, 3 AS Col_B UNION ALL
  SELECT 1 AS Col_A, 4 AS Col_B UNION ALL
  SELECT 5 AS Col_A, 7 AS Col_B UNION ALL
  SELECT 2 AS Col_A, 3 AS Col_B UNION ALL
  SELECT 6 AS Col_A, 1 AS Col_B   
)
SELECT 
  id, 
  COUNTIF(id = Col_A) AS Total_Col_A, 
  COUNTIF(id = Col_B) AS Total_Col_B
FROM logs, UNNEST(GENERATE_ARRAY(1,7)) AS id
GROUP BY id
ORDER BY id