数据:
DB <- data.frame(orderID = c(1,2,3,4,5,6,7,8,9,10),
orderDate = c("1.1.14","1.1.14","1.1.14","1.1.14","2.1.14", "2.1.14","2.1.14","2.1.14","2.1.14","2.1.14"),
itemID = c(2,3,2,5,12,4,2,3,1,5),
price = c(29.90, 39.90, 29.90, 19.90, 49.90, 9.90, 29.90, 39.90, 14.90, 19.90),
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1),
ItemReturned = c(0, 0, 0, 1, 1, 0, 1, 0, 0, 0))
预期结果:
percentageOfReturnedItemsLastOrder = c(0.33, 0.5, 0, 0.33, 0.33, 0, 0.5, 0.5, 0.33, 0.33)
你好,伙计们,
不幸的是,我有另一个问题,我无法单独解决 - 所以如果你偷看再帮助我,我会很高兴:)在数据集中,每个订单都有自己的ID,每个注册用户都有他唯一的customerID。每个客户都可以订购具有特定价格的物品(带有ItemID)。 “ItemReturned”列显示商品是否返回商店(“1”)或者客户是否将商品保留在家(“0”)我想计算从最后一个订单返回商店的商品百分比。我也是想要将结果添加为现有数据集中的新列...
已经用数据表尝试过,但我一定是犯了错误 - 因为它不起作用;)
library(data.table)
setDT(DB)[, orderDate := as.Date(orderDate, format = "%d.%m.%y")]
DB[, `:=` (percentageOfReturnedItemsLastOrder = sum(ItemRetuned> 0[orderDate == max(orderDate)])), by = customerID]
希望你能告诉我什么是错的,或者告诉我另一个解决问题的可能性......
干杯和THX!
答案 0 :(得分:1)
尝试
DB[, percentageOfReturnedItemsLastOrder :=
sum(ItemReturned[orderDate == max(orderDate)])/length(ItemReturned[orderDate == max(orderDate)]),
by = customerID]
# orderID orderDate itemID price customerID ItemReturned percentageOfReturnedItemsLastOrder
# 1: 1 2014-01-01 2 29.9 1 0 0.3333333
# 2: 2 2014-01-01 3 39.9 2 0 0.5000000
# 3: 3 2014-01-01 2 29.9 3 0 0.0000000
# 4: 4 2014-01-01 5 19.9 1 1 0.3333333
# 5: 5 2014-01-02 12 49.9 1 1 0.3333333
# 6: 6 2014-01-02 4 9.9 3 0 0.0000000
# 7: 7 2014-01-02 2 29.9 2 1 0.5000000
# 8: 8 2014-01-02 3 39.9 2 0 0.5000000
# 9: 9 2014-01-02 1 14.9 1 0 0.3333333
# 10: 10 2014-01-02 5 19.9 1 0 0.3333333