如何为每一行计算一组的“先前”均值?

时间:2016-09-10 14:52:10

标签: r dataframe

我的数据框看起来像这样:

df = data.table(type=rep(x=LETTERS[1:2], each=4),year=list(2009,2010,2013,2016,2003,2005,2009,2015), outcome = list(1,2,1,4,3,1,5,3))

    type year outcome
1:    A 2009       1
2:    A 2010       2
3:    A 2013       1
4:    A 2016       4
5:    B 2003       3
6:    B 2005       1
7:    B 2009       5
8:    B 2015       3

我想要做的是,对于每一行,计算按类型分组的结果的先前均值 我对“之前”的含义是,对于r的行type = A,我想计算j的所有行type=A的平均值j.year < r.year }}

在这种情况下,它会给出:

       type year outcome previousMean
1:    A 2009       1            0
2:    A 2010       2            1
3:    A 2013       1          1.5
4:    A 2016       4         1.33
5:    B 2003       3            0
6:    B 2005       1            3
7:    B 2009       5            2
8:    B 2015       3            3

感谢。

1 个答案:

答案 0 :(得分:0)

对于每个&#39;类型,我们可以循环遍历行的顺序,对&#39;结果进行子集化。根据序列,获取meanunlist,与0连接并指定(:=)以创建前一个均值&#39;

df[, previousMean := c(0,unlist(lapply(1:(.N-1), 
               function(i) mean(outcome[1:i])))), by = type]

或其他选项cummean来自dplyr

library(dplyr)
df[, previousMean := c(0,cummean(outcome)[-.N]), by = type]

数据

df = data.table(type=rep(x=LETTERS[1:2], each=4),
                year=c(2009,2010,2013,2016,2003,2005,2009,2015), 
                outcome = c(1,2,1,4,3,1,5,3))