我有一个包含user_id和一些值的表:
user_id | type | amount
---------------------------
user1 | credit| 15
---------------------------
user1 | bill | 100
---------------------------
user1 | fraud | 10000
----------------------------
user3 | fraud | 1000000
我的目标是每行有一个user_id:
user_id | credit | bill | fraud |
----------------------------------------
user1 | 15 | 100 | 10000
----------------------------------------
user3 | 0 | 0 | 1000000
我能够使用CASE创建静态语句,但我希望动态创建此部分,因为在某些情况下我的类别太多了。
CASE WHEN type='credit' THEN amount ELSE 0 END AS credit,
CASE WHEN type='fraud' THEN amount ELSE 0 END AS fraud,
CASE WHEN type='bill' THEN amount ELSE 0 END AS bill
and max() and group by in following select().
如果您熟悉R语言我正在寻找与model.matrix()相当的语言。
修改 我在SQL / Redshift中寻找解决方案。我知道如何在R中执行此操作,但是信息量太大而无法在R中处理。
答案 0 :(得分:0)
正如我在我的问题中提到的,有一种简单的方法来扩展列并在SQL中创建虚拟变量,但每列必须手动编码:
SELECT user_id,
CASE WHEN type='credit' THEN amount ELSE 0 END AS credit,
CASE WHEN type='fraud' THEN amount ELSE 0 END AS fraud,
CASE WHEN type='bill' THEN amount ELSE 0 END AS bill
FROM table1
GROUP BY user_id
我确信无法在Redshift中构建动态SQL,因此我在R中构建整个查询,然后将其传递给Redshift:
1. Get all possible values from Type column:
SELECT disctinct(type) from table1;
2. Create a dynamic query in R and execute it in Redshift.