我的数据框有2组1个时变量和一个因变量。 e.g:
name <- c("a", "a", "a", "a", "a", "a","a", "a", "a", "b", "b", "b","b", "b", "b","b", "b", "b")
class <- c("c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3","c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3")
year <- c("2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008")
value <- c(100, 33, 80, 90, 80, 100, 100, 90, 80, 90, 80, 100, 100, 90, 80, 99, 80, 100)
df <- data.frame(name, class, year, value)
df
并希望在“class”和“name”的每个组合中应用“diff”函数。
我想要的输出应该是这样的:
name class year value.1
1 a c1 2010 -67
2 a c1 2009 47
3 b c1 2010 -10
4 b c1 2009 20
...
我试过
aggregate(value~name + class, data=df, FUN="diff")
这不会产生我在大型数据集中寻找的解决方案。非常感谢你提前!
Sebatian
答案 0 :(得分:5)
plyr
包将成为您的朋友。函数ddply
采用data.frame
,为每个定义的子集应用函数,然后返回所有重组的data.frame
。
最简单的解决方案是对summarize
的每个组合使用diff(value)
和.(class, name)
:
library(plyr)
ddply(df, .(class, name), summarize, diff(value))
class name ..1
1 c1 a -67
2 c1 a 47
3 c1 b -10
4 c1 b 20
5 c2 a -10
6 c2 a 20
7 c2 b -10
8 c2 b -10
9 c3 a -10
10 c3 a -10
11 c3 b -19
12 c3 b 20
为了在结果中获得好几年,我们会更多地参与其中:
ddply(df, .(class, name), summarize, year=head(year, -1), value=diff(value))
class name year value
1 c1 a 2010 -67
2 c1 a 2009 47
3 c1 b 2010 -10
4 c1 b 2009 20
5 c2 a 2010 -10
6 c2 a 2009 20
7 c2 b 2010 -10
8 c2 b 2009 -10
9 c3 a 2010 -10
10 c3 a 2009 -10
11 c3 b 2010 -19
12 c3 b 2009 20