我正在尝试创建一个新的数据框,以显示促销代码是否在特定日期使用(二进制不是实际的总和\计数),而且还对代码产生的销售额进行了总计。
DATA:
+---------------+------------+--------------+
| Order Date | Promo Code | Sales Amount |
+---------------+------------+--------------+
| 10-29-20 | today20 | 50 |
+---------------+------------+--------------+
| 10-29-20 | vip20 | 50 |
+---------------+------------+--------------+
| 10-29-20 | today20 | 50 |
+---------------+------------+--------------+
| 10-28-20 | vip20 | 50 |
+---------------+------------+--------------+
| 10-28-20 | vip20 | 50 |
+---------------+------------+--------------+
| 10-27-20 | pc20 | 25 |
+---------------+------------+--------------+
| 10-28-20 | | 50 |
+---------------+------------+--------------+
| 10-28-20 | vip20 | 50 |
+---------------+------------+--------------+
| 10-27-20 | | 25 |
+---------------+------------+--------------+
| .... | .... | .... |
+---------------+------------+--------------+
| .... | .... | .... |
+---------------+------------+--------------+
NEW DATAFRAME
+---------------+------------+--------------+--------------+--------------+
|Order Date | today20 | vip20 | pc20 | Sales Total |
+---------------+------------+--------------+--------------+--------------+
| 10-29-20 | 1 | 1 | 0 | 150.00 |
+---------------+------------+--------------+--------------+--------------+
| 10-28-20 | 0 | 1 | 0 | 100.00 |
+---------------+------------+--------------+--------------+--------------+
| 10-27-20 | 0 | 0 | 1 | 25.00 |
+---------------+------------+--------------+--------------+--------------+
| .... | .... | .... | .... | .... |
+---------------+------------+--------------+--------------+--------------+
| .... | .... | .... | .... | .... |
+---------------+------------+--------------+--------------+--------------+
答案 0 :(得分:0)
使用dcast
我更喜欢使用data.table,所以
library(data.table)
dt=setDT(yourdf)
#This will give you the column names but with the sum in each column.
dt.wide=dcast(data=dt, formula = Order_date ~ Promo_code, fun.aggregate='sum', value.var='Sales_amount')
#Adds sales_total column
dt$Sales_total = apply(dt[,-c('Order_date'),with=F],1,sum)
#Converts promo_code columns to binary
dt[,lapply(.SD,function(x) x[x>0]=1),.SDcols=unique(yourdf$Promo_code)]