计算每个用户的订单数量&计算每个订单的每个订单的平均值

时间:2016-10-19 08:31:53

标签: r

数据:

DB <- data.frame(orderID = c(1,2,3,1,1,3,2,4,5,5),    
orderDate = c("1.1.12","1.1.12","1.1.12","1.1.12","1.1.12", "1.1.12","1.1.12","2.1.12","2.1.12","2.1.12"),  
itemID = c(2,3,2,5,12,4,2,3,1,5),   
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1),
itemPrice = c(9.99, 14.99, 9.99, 19.99, 29.99, 4.99, 9.99, 14.99, 49.99, 19.99))

预期结果:

NumberofOrdersOfSpecificUser = c(2, 2, 1, 2, 2, 1, 2, 2, 2, 2) 
AverageValuePerOrder = c(64.975, 19.985, 14.98, 64.975, 64.975, 14.98, 19.985, 19.985, 64.975, 64.975)

了解:

orderID是连续的。在同一天从同一客户(ID)订购的产品获得相同的orderID。当同一客户在另一天订购产品时,他/她就是新的订单ID。

您好,

我想2想: 1.计算每个用户的订单数量 2.计算每个订单的每个订单的平均值

我们怎么做?

我已经尝试了这个:

DB$NumberofOrdersOfSpecificUser <- with(DB,ave(as.numeric(mydata$orderDate), customerID, FUN=function(x) length(unique(x))))
DB$NumberofOrdersOfSpecificUser <- as.integer(DB$NumberofOrdersOfSpecificUser)
DB$orderDate <- as.factor(DB$orderDate)

非常感谢您的支持!

1 个答案:

答案 0 :(得分:2)

当然,有很多方法可以做到这一点。在您的首选结果中,存在冗余数据。使用R时,实际上不需要将摘要数据强制回单个记录,而是创建新对象并继续使用该对象。

my.summaries <- data.frame(customerID = unique(DB$customerID),
                           NumberofOrdersOfSpecificUser = sapply(unique(DB$customerID), function(customer) { length(unique(DB$orderDate[which(DB$customerID == customer)])) } ),
                           AverageValuePerOrder = tapply(tapply(DB$itemPrice, DB$orderID, sum), DB$customerID[match(unique(DB$orderID), DB$orderID)], mean)
                           )

my.summaries
  customerID NumberofOrdersOfSpecificUser AverageValuePerOrder
1          1                            2               64.975
2          2                            2               19.985
3          3                            1               14.980

如果您确实需要强制将摘要数据强制恢复到各个记录,请使用merge()

merge(DB, my.summaries)
   customerID orderID orderDate itemID itemPrice NumberofOrdersOfSpecificUser AverageValuePerOrder
1           1       1    1.1.12      2      9.99                            2               64.975
2           1       5    2.1.12      5     19.99                            2               64.975
3           1       1    1.1.12      5     19.99                            2               64.975
4           1       1    1.1.12     12     29.99                            2               64.975
5           1       5    2.1.12      1     49.99                            2               64.975
6           2       2    1.1.12      2      9.99                            2               19.985
7           2       2    1.1.12      3     14.99                            2               19.985
8           2       4    2.1.12      3     14.99                            2               19.985
9           3       3    1.1.12      2      9.99                            1               14.980
10          3       3    1.1.12      4      4.99                            1               14.980
编辑:由于原始海报添加了一个要求解决方案应该快速的要求,这是一个快速的解决方案,使用data.table

library(data.table)
dt <- data.table(DB)
orders.per.customer <- dt[, sum(itemPrice), by="orderID,customerID"]
my.summaries <- merge(orders.per.customer[, length(orderID), by=customerID],
                      orders.per.customer[, mean(V1), by=customerID],
                      by = "customerID")
colnames(my.summaries) <- c("customerID",
                          "NumberofOrdersOfSpecificUser", "AverageValuePerOrder")
dt <- merge(dt, my.summaries, by = "customerID")