我正在尝试计算每个customerID的平均值或平均值。对于以下数据:
customerID <- c(1,1,1,1,2,2,2,2,3,3)
dates <- c(20130401, 20130403, 20130504, 20130508, 20130511,
20130716, 20130719, 20130723, 20130729, 20130907)
cost <- c(12, 41, 89, 45.5, 32.89, 74, 76, 12, 15.78, 10)
data <- data.frame(customerID, dates,cost)
data$dates <- as.Date(as.character(data$dates), "%Y%m%d")
# data2 <- aggregate(cbind(average_cost=cost) + customerID, data, mean)
数据看起来像这样:
customerID dates cost
1 20130401 12
1 20130403 41
1 20130504 89
1 20130508 45.5
2 20130511 32.89
2 20130716 74
2 20130719 76
2 20130723 12
3 20130729 15.78
3 20130907 10
如何获得与此类似的输出?我可以获得整个数据集的平均值,但不能获得每个客户ID的平均值。谢谢!
customerID average_cost
1 46.875
2 48.7225
3 12.89
答案 0 :(得分:0)
dplyr
解决方案
library(dplyr)
df %>%
group_by(customerID) %>%
summarise(average_cost = mean(cost))
customerID average_cost
1 1 46.8750
2 2 48.7225
3 3 12.8900
data.table
解决方案
library(data.table)
dt <- as.data.table(df)
dt[, .(average_cost = mean(cost)), by=customerID]
另外,如果您只想要基础R
aggregate(cost ~ customerID, data=df, mean)