可能重复:
Beginner tips on using plyr to calculate year-over-year change across groups
在多个变量组(即地区和食品)中计算现有数据框变量(即销售额)的年度差异(新变量)的好方法是什么?
以下是数据框架结构的示例:
Date Region Type Sales
1/1/2001 East Food 120
1/1/2001 West Housing 130
1/1/2001 North Food 130
1/2/2001 East Food 133
1/3/2001 West Housing 140
1/4/2001 North Food 150
….
….
1/29/2013 East Food 125
1/29/2013 West Housing 137
1/29/2013 North Food 1350
此外,除了区分数据外,我还想计算一个尾随(比如7天)的移动平均线。
非常感谢任何指导。
答案 0 :(得分:3)
这是让你入门的东西。对于这类事情,data.table
是一个很棒的软件包,因为它为这些事情提供了一种简洁易用的语法(一旦你超越了学习曲线)。
library(data.table)
创建可重现的示例
set.seed(128)
regions = c("East", "West", "North", "South")
types = c("Food", "Housing")
dates <- seq(as.Date('2009-01-01'), as.Date('2011-12-31'), by = 1)
n <- length(dates)
dt <- data.table(Date = dates,
Region = sample(regions, n, replace = TRUE),
Type = sample(types, n, replace = TRUE),
Sales = round(rnorm(n, mean = 100, sd = 10)))
添加年份列
dt[, Year := year(Date)]
> dt
Date Region Type Sales Year
1: 2009-01-01 West Food 119 2009
2: 2009-01-02 North Housing 102 2009
3: 2009-01-03 North Housing 102 2009
4: 2009-01-04 North Food 101 2009
5: 2009-01-05 West Food 101 2009
---
1091: 2011-12-27 East Housing 122 2011
1092: 2011-12-28 East Housing 88 2011
1093: 2011-12-29 North Food 115 2011
1094: 2011-12-30 West Housing 96 2011
1095: 2011-12-31 East Food 101 2011
按年计算摘要
summary <- dt[, list(Sales = sum(Sales)), by = 'Year,Region,Type']
setkey(summary, 'Year')
> head(summary)
Year Region Type Sales
1: 2009 West Food 4791
2: 2009 North Housing 3517
3: 2009 North Food 6774
4: 2009 South Housing 4380
5: 2009 East Food 4144
6: 2009 West Housing 4275
为每个地区/产品组合创建年度差异的功能。
YoYdiff <- function(dt) {
# Calculate year-on-year difference for Sales column
data.table(Sales.Diff = diff(dt$Sales), Year = dt$Year[-1])
}
按列计算年度差异。这适用于我的示例,因为setkey(dt,Year)按年份对数据表进行排序,但如果您的示例在某些产品/区域中缺少某些年份,则必须更加小心。
> summary[, YoYdiff(.SD), by = 'Region,Type']
Region Type Sales.Diff Year
1: West Food -412 2010
2: West Food 121 2011
3: North Housing 1907 2010
4: North Housing -1457 2011
5: North Food -3087 2010
6: North Food 369 2011
7: South Housing -539 2010
8: South Housing 575 2011
9: East Food 1264 2010
10: East Food -1732 2011
11: West Housing 298 2010
12: West Housing -410 2011
13: South Food -889 2010
14: South Food 1045 2011
15: East Housing 1146 2010
16: East Housing 1169 2011