R具有二进制列的新数据框

时间:2020-11-09 16:54:47

标签: r dataframe

我正在尝试创建一个新的数据框,以显示促销代码是否在特定日期使用(二进制不是实际的总和\计数),而且还对代码产生的销售额进行了总计。

DATA:
+---------------+------------+--------------+
| Order Date    | Promo Code | Sales Amount | 
+---------------+------------+--------------+
| 10-29-20      |   today20  |   50         |  
+---------------+------------+--------------+
| 10-29-20      |   vip20    |   50         |   
+---------------+------------+--------------+
| 10-29-20      |   today20  |   50         |  
+---------------+------------+--------------+
| 10-28-20      |   vip20    |   50         |   
+---------------+------------+--------------+
| 10-28-20      |   vip20    |   50         |   
+---------------+------------+--------------+
| 10-27-20      |   pc20     |   25         |
+---------------+------------+--------------+
| 10-28-20      |            |   50         |   
+---------------+------------+--------------+
| 10-28-20      |   vip20    |   50         |   
+---------------+------------+--------------+
| 10-27-20      |            |   25         |
+---------------+------------+--------------+
| ....          |      ....  |   ....       |
+---------------+------------+--------------+
| ....          |      ....  |   ....       |
+---------------+------------+--------------+


NEW DATAFRAME
+---------------+------------+--------------+--------------+--------------+
|Order Date     | today20    | vip20        |  pc20        | Sales Total  |
+---------------+------------+--------------+--------------+--------------+
| 10-29-20      |   1        |   1          |    0         |  150.00      |
+---------------+------------+--------------+--------------+--------------+
| 10-28-20      |   0        |   1          |    0         |  100.00      |
+---------------+------------+--------------+--------------+--------------+
| 10-27-20      |   0        |   0          |    1         |   25.00      |
+---------------+------------+--------------+--------------+--------------+
| ....          |      ....  |   ....       | ....         |   ....       |
+---------------+------------+--------------+--------------+--------------+
| ....          |      ....  |   ....       | ....         |   ....       |
+---------------+------------+--------------+--------------+--------------+

1 个答案:

答案 0 :(得分:0)

使用dcast

我更喜欢使用data.table,所以

library(data.table)

dt=setDT(yourdf)

#This will give you the column names but with the sum in each column.
dt.wide=dcast(data=dt, formula = Order_date ~ Promo_code, fun.aggregate='sum', value.var='Sales_amount')

#Adds sales_total column
dt$Sales_total = apply(dt[,-c('Order_date'),with=F],1,sum)

#Converts promo_code columns to binary 
dt[,lapply(.SD,function(x) x[x>0]=1),.SDcols=unique(yourdf$Promo_code)]