我有一个如下表:
| user_id | product_purchased |
-------------------------------
| 111 | A |
| 111 | B |
| 222 | B |
| 222 | B |
| 333 | C |
| 444 | A |
我想透视表以将用户ID作为行,将用户购买的每种产品的计数作为列。 因此对于上表,它看起来像:
| user_id | product A | product B | product C |
-----------------------------------------------
| 111 | 1 | 1 | 0 |
| 222 | 0 | 2 | 0 |
| 333 | 0 | 0 | 1 |
| 444 | 1 | 0 | 0 |
我知道这可以使用countif语句手动完成:
#standardsql
select user_id,
countif(product_purchased = 'A') as 'A',
countif(product_purchased = 'B') as 'B',
etc,
group by user_id
但是,实际上该表具有太多可能的产品,因此无法手动写出所有选项。有没有办法以一种更加自动化和优雅的方式来实现这一点?
答案 0 :(得分:3)
实际上,该表有太多可能的产品,无法手动将所有选项写出
以下是用于BigQuery标准SQL
您可以分两个步骤进行操作-首先通过在下面运行
来准备动态数据透视查询#standardSQL
SELECT CONCAT('SELECT user_id, ',
STRING_AGG(
CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
),
' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
SELECT product_purchased
FROM `project.dataset.your_table`
GROUP BY product_purchased
)
结果是,您将获得表示需要运行以获得所需结果的查询的字符串
例如,如果要应用于问题中的伪数据
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 111 user_id, 'A' product_purchased UNION ALL
SELECT 111, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 333, 'C' UNION ALL
SELECT 444, 'A'
)
SELECT CONCAT('SELECT user_id, ',
STRING_AGG(
CONCAT('COUNTIF(product_purchased = "', product_purchased, '") AS product_', product_purchased)
),
' FROM `project.dataset.your_table` GROUP BY user_id')
FROM (
SELECT product_purchased
FROM `project.dataset.your_table`
GROUP BY product_purchased
)
您将获得以下查询(格式为在此处查看更好)
SELECT
user_id,
COUNTIF(product_purchased = "A") AS product_A,
COUNTIF(product_purchased = "B") AS product_B,
COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id
现在,您只需运行此命令即可获得所需结果,而无需手动编码
同样,如果要针对您问题中的伪数据运行它
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 111 user_id, 'A' product_purchased UNION ALL
SELECT 111, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 222, 'B' UNION ALL
SELECT 333, 'C' UNION ALL
SELECT 444, 'A'
)
SELECT
user_id,
COUNTIF(product_purchased = "A") AS product_A,
COUNTIF(product_purchased = "B") AS product_B,
COUNTIF(product_purchased = "C") AS product_C
FROM `project.dataset.your_table`
GROUP BY user_id
-- ORDER BY user_id
您获得预期的结果
Row user_id product_A product_B product_C
1 111 1 1 0
2 222 0 2 0
3 333 0 0 1
4 444 1 0 0
有没有办法以一种更加自动化和优雅的方式来实现这一点?
您可以使用选择的任何client
轻松地自动完成上述操作