汇总函数给我的按月平均销售量工作正常。
library(chron)
set.seed(42)
dat <- data.frame(sales = rnorm(1000, mean = 1000, sd = 40),
dates = rep(as.Date(seq(from = 14610, to = 14859),
origin = "1970-01-01"),4))
aggregate(sales~months(as.chron(dates)), mean, data=dat)
...并生成以下输出:
months(as.chron(dates)) sales
1 Jan 1000.0723
2 Feb 999.1580
3 Mar 995.3055
4 Apr 1000.4912
5 May 1003.9703
6 Jun 997.1086
7 Jul 996.5939
8 Aug 998.5012
9 Sep 1001.3709
我的理解是下面的cast语句应该产生相同的输出:
cast(dat, months(as.chron(dates)) ~ ., mean, value="sales")
但是返回以下错误:
Error: Casting formula contains variables not found in molten data: months(as.chron(dates))
我可能遗漏了一些东西,但是可以在演员声明中使用chron months()调用吗?以下两个语句将在cast()中完成相同的操作,但我试图在一个步骤中完成它,并更好地理解转换是如何工作的。
dat$mont <- months(as.chron(dat$dates))
cast(dat, mont ~ ., mean, value="sales")
提前致谢, --JT
答案 0 :(得分:3)
这适用于reshape2
library(reshape2)
dcast(dat, months(as.chron(dates)) ~ ., mean, value.var="sales")
## months(as.chron(dates)) NA
## 1 Jan 1004.5404
## 2 Feb 1002.3146
## 3 Mar 996.0883
## 4 Apr 994.1707
## 5 May 1000.4652
## 6 Jun 1002.8020
## 7 Jul 996.0357
## 8 Aug 1001.6754
## 9 Sep 997.6772
或者您可以使用plyr
library(plyr)
ddply(dat, .(months = months(as.chron(dates))), summarize, sales = mean(sales))
## months sales
## 1 Jan 1004.5404
## 2 Feb 1002.3146
## 3 Mar 996.0883
## 4 Apr 994.1707
## 5 May 1000.4652
## 6 Jun 1002.8020
## 7 Jul 996.0357
## 8 Aug 1001.6754
## 9 Sep 997.6772
或使用data.table
library(data.table)
DT <- data.table(dat)
DT[, month := months(as.chron(dates))][,list(sales = mean(sales)),by = month]
## month sales
## 1: Jan 1004.5404
## 2: Feb 1002.3146
## 3: Mar 996.0883
## 4: Apr 994.1707
## 5: May 1000.4652
## 6: Jun 1002.8020
## 7: Jul 996.0357
## 8: Aug 1001.6754
## 9: Sep 997.6772
Matthew Dowle的评论
:=
不需要{i},因为by
直接接受表达式:
DT[, list(sales=mean(sales)), by=months(as.chron(dates))]
## months sales
## 1: Jan 1004.5404
## 2: Feb 1002.3146
## 3: Mar 996.0883
## 4: Apr 994.1707
## 5: May 1000.4652
## 6: Jun 1002.8020
## 7: Jul 996.0357
## 8: Aug 1001.6754
## 9: Sep 997.6772