数据:
DB1 <- data.frame(orderItemID = c(1,2,3,4,5,6,7,8,9,10),
orderDate = c("1.1.12","1.1.12","1.1.12","1.1.12","1.1.12", "1.1.12","1.1.12","2.1.12","2.1.12","2.1.12"),
itemID = c(2,3,2,5,12,4,2,3,1,5),
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1))
预期结果:
Numberoforderedproductstotal = c(5, 3, 2, 5, 5, 2, 3, 5, 3, 2)
Numberoforderedproductslastorder = c(2, 1, 2, 2, 2, 2, 1, 1, 2, 2)
Numberoforderedproductsaverage = c(2.5 , 1.5, 2, 2.5, 2.5, 2, 1.5, 1.5, 2.5, 2.5)
嘿伙计们,
我又一次遇到了一个我无法解决的问题:
在数据集中,我有相同大小或相同颜色的项目,相同的ItemID。每个注册用户都有自己唯一的customerID。
我想识别(统计)每个用户订购的文章数量:
1.总计到现在((总计所有订购物品的数量)
2.在最后一个订单(总结每个用户的最后订单的所有订购商品的数量[今天的日期例如是15.1.12])
3.订单总数平均订单数量
我还想将结果添加为现有数据集中的新列...
我已经尝试了“计数”和“计数”功能 - 还有“countrep”和聚合:但它们都没有正常工作......
我忘记了我还想要第四列的订单数量!
预期产出然后:
numberoforders: c(2, 2, 1, 2, 2, 1, 2, 2, 2, 2)
非常感谢您的支持!
答案 0 :(得分:0)
好的,以下代码似乎可以实现您想要的输出
library(data.table)
setDT(DB1)[, orderDate := as.Date(orderDate, format = "%d.%m.%y")]
DB1[, `:=`(Numberoforderedproductstotal = .N,
Numberoforderedproductslastorder = length(itemID[orderDate == max(orderDate)]),
Numberoforderedproductsaverage = .N/length(unique(orderDate)),
Numberoforders = length(unique(orderDate))),
by = customerID][]
# orderItemID orderDate itemID customerID Numberoforderedproductstotal Numberoforderedproductslastorder Numberoforderedproductsaverage Numberoforders
# 1: 1 2012-01-01 2 1 5 2 2.5 2
# 2: 2 2012-01-01 3 2 3 1 1.5 2
# 3: 3 2012-01-01 2 3 2 2 2.0 1
# 4: 4 2012-01-01 5 1 5 2 2.5 2
# 5: 5 2012-01-01 12 1 5 2 2.5 2
# 6: 6 2012-01-01 4 3 2 2 2.0 1
# 7: 7 2012-01-01 2 2 3 1 1.5 2
# 8: 8 2012-01-02 3 2 3 1 1.5 2
# 9: 9 2012-01-02 1 1 5 2 2.5 2
# 10: 10 2012-01-02 5 1 5 2 2.5 2
答案 1 :(得分:0)
您可以尝试使用ave
base R
with(DB1, ave(customerID, customerID, FUN=length))
# [1] 5 3 2 5 5 2 3 3 5 5
DB2 <- transform(DB1, date=as.Date(orderDate, '%d.%m.%Y'))
with(DB2, ave(as.numeric(date), customerID, FUN=function(x) sum(x ==max(x))))
#[1] 2 1 2 2 2 2 1 1 2 2
with(DB2, ave(as.numeric(date), customerID,
FUN=function(x) sum(table(x))/length(unique(x))))
# [1] 2.5 1.5 2.0 2.5 2.5 2.0 1.5 1.5 2.5 2.5
或者使用dplyr
(来自@David Arenburg的评论n_distinct
)
library(dplyr)
res <- DB1%>%
group_by(customerID) %>%
mutate(orderDate=as.Date(orderDate, '%d.%m.%Y'),
Numberoforderedproductstotal=n(),
Numberoforderedproductslastorder= sum(orderDate==max(orderDate)),
Numberoforderedproductsaverage=n()/n_distinct(orderDate),
Numberoforders= n_distinct(orderDate))
as.data.frame(res)[-(1:4)]
# Numberoforderedproductstotal Numberoforderedproductslastorder
#1 5 2
#2 3 1
#3 2 2
#4 5 2
#5 5 2
#6 2 2
#7 3 1
#8 3 1
#9 5 2
#10 5 2
# Numberoforderedproductsaverage Numberoforders
#1 2.5 2
#2 1.5 2
#3 2.0 1
#4 2.5 2
#5 2.5 2
#6 2.0 1
#7 1.5 2
#8 1.5 2
#9 2.5 2
#10 2.5 2