相当于使用强制转换的函数聚合语句

时间:2012-07-05 20:23:07

标签: r aggregate reshape reshape2

汇总函数给我的按月平均销售量工作正常。

library(chron)
set.seed(42)
dat <- data.frame(sales = rnorm(1000, mean = 1000, sd = 40),
              dates = rep(as.Date(seq(from = 14610, to = 14859),
                              origin = "1970-01-01"),4))
aggregate(sales~months(as.chron(dates)), mean, data=dat)

...并生成以下输出:

months(as.chron(dates))     sales
1                     Jan 1000.0723
2                     Feb  999.1580
3                     Mar  995.3055
4                     Apr 1000.4912
5                     May 1003.9703
6                     Jun  997.1086
7                     Jul  996.5939
8                     Aug  998.5012
9                     Sep 1001.3709

我的理解是下面的cast语句应该产生相同的输出:

cast(dat, months(as.chron(dates)) ~ ., mean, value="sales")

但是返回以下错误:

Error: Casting formula contains variables not found in molten data: months(as.chron(dates))

我可能遗漏了一些东西,但是可以在演员声明中使用chron months()调用吗?以下两个语句将在cast()中完成相同的操作,但我试图在一个步骤中完成它,并更好地理解转换是如何工作的。

dat$mont <- months(as.chron(dat$dates))
cast(dat, mont ~ ., mean, value="sales")

提前致谢, --JT

1 个答案:

答案 0 :(得分:3)

这适用于reshape2

library(reshape2)
dcast(dat, months(as.chron(dates)) ~ ., mean, value.var="sales")
##   months(as.chron(dates))        NA
## 1                     Jan 1004.5404
## 2                     Feb 1002.3146
## 3                     Mar  996.0883
## 4                     Apr  994.1707
## 5                     May 1000.4652
## 6                     Jun 1002.8020
## 7                     Jul  996.0357
## 8                     Aug 1001.6754
## 9                     Sep  997.6772

或者您可以使用plyr

library(plyr)
ddply(dat, .(months = months(as.chron(dates))), summarize, sales = mean(sales))
##  months     sales
## 1   Jan 1004.5404
## 2   Feb 1002.3146
## 3   Mar  996.0883
## 4   Apr  994.1707
## 5   May 1000.4652
## 6   Jun 1002.8020
## 7   Jul  996.0357
## 8   Aug 1001.6754
## 9   Sep  997.6772

或使用data.table

library(data.table)
DT <- data.table(dat)
DT[, month := months(as.chron(dates))][,list(sales =  mean(sales)),by = month]
##    month     sales
## 1:   Jan 1004.5404
## 2:   Feb 1002.3146
## 3:   Mar  996.0883
## 4:   Apr  994.1707
## 5:   May 1000.4652
## 6:   Jun 1002.8020
## 7:   Jul  996.0357
## 8:   Aug 1001.6754
## 9:   Sep  997.6772

Matthew Dowle的评论

:=不需要{i},因为by直接接受表达式:

DT[, list(sales=mean(sales)), by=months(as.chron(dates))]
##    months     sales
## 1:    Jan 1004.5404
## 2:    Feb 1002.3146
## 3:    Mar  996.0883
## 4:    Apr  994.1707
## 5:    May 1000.4652
## 6:    Jun 1002.8020
## 7:    Jul  996.0357
## 8:    Aug 1001.6754
## 9:    Sep  997.6772