如何在没有手动硬编码的情况下在bigquery标准SQL中透视数据?

时间:2018-08-05 21:01:42

标签: sql google-bigquery

我有一个如下表:

| user_id | product_purchased |
-------------------------------
|    111  |        A           |
|    111  |        B           |
|    222  |        B           |
|    222  |        B           |
|    333  |        C           |
|    444  |        A           |

我想透视表以将用户ID作为行,将用户购买的每种产品的计数作为列。 因此对于上表,它看起来像:

| user_id | product A | product B | product C |
-----------------------------------------------
|    111  |     1      |      1    |     0    |
|    222  |     0      |      2    |     0    |
|    333  |     0      |      0    |     1    |
|    444  |     1      |      0    |     0    |

我知道这可以使用countif语句手动完成:

#standardsql
select user_id,
       countif(product_purchased = 'A') as 'A',
       countif(product_purchased = 'B') as 'B',
       etc,
group by user_id

但是,实际上该表具有太多可能的产品,因此无法手动写出所有选项。有没有办法以一种更加自动化和优雅的方式来实现这一点?

1 个答案:

答案 0 :(得分:3)

  

实际上,该表有太多可能的产品,无法手动将所有选项写出

以下是用于BigQuery标准SQL

您可以分两个步骤进行操作-首先通过在下面运行

来准备动态数据透视查询
#standardSQL
SELECT CONCAT('SELECT user_id, ', 
  STRING_AGG(
    CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
  ), 
  ' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
  SELECT product_purchased 
  FROM `project.dataset.your_table`  
  GROUP BY product_purchased
)

结果是,您将获得表示需要运行以获得所需结果的查询的字符串

例如,如果要应用于问题中的伪数据

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 111 user_id, 'A' product_purchased UNION ALL
  SELECT 111, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 333, 'C' UNION ALL
  SELECT 444, 'A' 
)
SELECT CONCAT('SELECT user_id, ', 
  STRING_AGG(
    CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
  ), 
  ' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
  SELECT product_purchased 
  FROM `project.dataset.your_table`  
  GROUP BY product_purchased
)

您将获得以下查询(格式为在此处查看更好)

SELECT
  user_id,
  COUNTIF(product_purchased = "A") AS product_A,
  COUNTIF(product_purchased = "B") AS product_B,
  COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id

现在,您只需运行此命令即可获得所需结果,而无需手动编码

同样,如果要针对您问题中的伪数据运行它

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 111 user_id, 'A' product_purchased UNION ALL
  SELECT 111, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 222, 'B' UNION ALL
  SELECT 333, 'C' UNION ALL
  SELECT 444, 'A' 
)
SELECT
  user_id,
  COUNTIF(product_purchased = "A") AS product_A,
  COUNTIF(product_purchased = "B") AS product_B,
  COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id   
-- ORDER BY user_id

您获得预期的结果

Row user_id product_A   product_B   product_C    
1   111     1           1           0    
2   222     0           2           0    
3   333     0           0           1    
4   444     1           0           0    
  

有没有办法以一种更加自动化和优雅的方式来实现这一点?

您可以使用选择的任何client轻松地自动完成上述操作