函数“diff”在R中的各个组

时间:2011-11-24 09:08:55

标签: r plyr

我的数据框有2组1个时变量和一个因变量。 e.g:

name <- c("a", "a", "a", "a", "a", "a","a", "a", "a", "b", "b", "b","b", "b", "b","b", "b", "b")
class <- c("c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3","c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3")
year <- c("2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008")
value <- c(100, 33, 80, 90, 80, 100, 100, 90, 80, 90, 80, 100, 100, 90, 80, 99, 80, 100)

df <- data.frame(name, class, year, value)
df

并希望在“class”和“name”的每个组合中应用“diff”函数。

我想要的输出应该是这样的:

      name class year value.1
    1    a    c1   2010  -67      
    2    a    c1   2009   47
    3    b    c1   2010  -10
    4    b    c1   2009   20
    ...

我试过

aggregate(value~name + class, data=df, FUN="diff")

这不会产生我在大型数据集中寻找的解决方案。非常感谢你提前!

Sebatian

1 个答案:

答案 0 :(得分:5)

plyr包将成为您的朋友。函数ddply采用data.frame,为每个定义的子集应用函数,然后返回所有重组的data.frame

最简单的解决方案是对summarize的每个组合使用diff(value).(class, name)

library(plyr)
ddply(df, .(class, name), summarize, diff(value))

   class name ..1
1     c1    a -67
2     c1    a  47
3     c1    b -10
4     c1    b  20
5     c2    a -10
6     c2    a  20
7     c2    b -10
8     c2    b -10
9     c3    a -10
10    c3    a -10
11    c3    b -19
12    c3    b  20

为了在结果中获得好几年,我们会更多地参与其中:

ddply(df, .(class, name), summarize, year=head(year, -1), value=diff(value))
   class name year value
1     c1    a 2010   -67
2     c1    a 2009    47
3     c1    b 2010   -10
4     c1    b 2009    20
5     c2    a 2010   -10
6     c2    a 2009    20
7     c2    b 2010   -10
8     c2    b 2009   -10
9     c3    a 2010   -10
10    c3    a 2009   -10
11    c3    b 2010   -19
12    c3    b 2009    20