数据:
DB <- data.frame(orderID = c(1,2,3,1,1,3,2,4,5,5),
orderDate = c("1.1.12","1.1.12","1.1.12","1.1.12","1.1.12", "1.1.12","1.1.12","2.1.12","2.1.12","2.1.12"),
itemID = c(2,3,2,5,12,4,2,3,1,5),
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1),
itemPrice = c(9.99, 14.99, 9.99, 19.99, 29.99, 4.99, 9.99, 14.99, 49.99, 19.99))
预期结果:
NumberofOrdersOfSpecificUser = c(2, 2, 1, 2, 2, 1, 2, 2, 2, 2)
AverageValuePerOrder = c(64.975, 19.985, 14.98, 64.975, 64.975, 14.98, 19.985, 19.985, 64.975, 64.975)
了解:
orderID是连续的。在同一天从同一客户(ID)订购的产品获得相同的orderID。当同一客户在另一天订购产品时,他/她就是新的订单ID。
您好,
我想2想: 1.计算每个用户的订单数量 2.计算每个订单的每个订单的平均值
我们怎么做?
我已经尝试了这个:
DB$NumberofOrdersOfSpecificUser <- with(DB,ave(as.numeric(mydata$orderDate), customerID, FUN=function(x) length(unique(x))))
DB$NumberofOrdersOfSpecificUser <- as.integer(DB$NumberofOrdersOfSpecificUser)
DB$orderDate <- as.factor(DB$orderDate)
非常感谢您的支持!
答案 0 :(得分:2)
当然,有很多方法可以做到这一点。在您的首选结果中,存在冗余数据。使用R时,实际上不需要将摘要数据强制回单个记录,而是创建新对象并继续使用该对象。
my.summaries <- data.frame(customerID = unique(DB$customerID),
NumberofOrdersOfSpecificUser = sapply(unique(DB$customerID), function(customer) { length(unique(DB$orderDate[which(DB$customerID == customer)])) } ),
AverageValuePerOrder = tapply(tapply(DB$itemPrice, DB$orderID, sum), DB$customerID[match(unique(DB$orderID), DB$orderID)], mean)
)
my.summaries
customerID NumberofOrdersOfSpecificUser AverageValuePerOrder
1 1 2 64.975
2 2 2 19.985
3 3 1 14.980
如果您确实需要强制将摘要数据强制恢复到各个记录,请使用merge()
merge(DB, my.summaries)
customerID orderID orderDate itemID itemPrice NumberofOrdersOfSpecificUser AverageValuePerOrder
1 1 1 1.1.12 2 9.99 2 64.975
2 1 5 2.1.12 5 19.99 2 64.975
3 1 1 1.1.12 5 19.99 2 64.975
4 1 1 1.1.12 12 29.99 2 64.975
5 1 5 2.1.12 1 49.99 2 64.975
6 2 2 1.1.12 2 9.99 2 19.985
7 2 2 1.1.12 3 14.99 2 19.985
8 2 4 2.1.12 3 14.99 2 19.985
9 3 3 1.1.12 2 9.99 1 14.980
10 3 3 1.1.12 4 4.99 1 14.980
编辑:由于原始海报添加了一个要求解决方案应该快速的要求,这是一个快速的解决方案,使用data.table
library(data.table)
dt <- data.table(DB)
orders.per.customer <- dt[, sum(itemPrice), by="orderID,customerID"]
my.summaries <- merge(orders.per.customer[, length(orderID), by=customerID],
orders.per.customer[, mean(V1), by=customerID],
by = "customerID")
colnames(my.summaries) <- c("customerID",
"NumberofOrdersOfSpecificUser", "AverageValuePerOrder")
dt <- merge(dt, my.summaries, by = "customerID")