分组大小的累积均值函数问题

时间:2016-04-09 00:06:14

标签: r

我有一个问题,我有一个计算累积平均值,在一个字段上的组滞后一个:

cumroll <- function(x) {  x <- head(x, -1)
c(head(x,1), cumsum(x) / seq_along(x))}

只要我对大于1的组执行此功能,一切正常:

Player <- c('B','B','C','C','C','D','D','D','D','E','E','E','E','E')
Team <- c('B','B','C','C','C','D','D','D','D','E','E','E','E','E')
Score <- c(2,7,3,9,6,3,7,1,7,3,8,3,4,1)
data.frame(Player, Team, Score)

test <- ave(Score, Player, Team, FUN = cumroll)
data.frame(Player, Team, Score, test)

但是,当我的数据集具有大小为1的分组时:

Player <- c('A','B','B','C','C','C','D','D','D','D','E','E','E','E','E')
Team <- c('A','B','B','C','C','C','D','D','D','D','E','E','E','E','E')
Score <- c(5,2,7,3,9,6,3,7,1,7,3,8,3,4,1)
data.frame(Player, Team, Score)

test <- ave(Score, Player, Team, FUN = cumroll)
data.frame(Player, Team, Score, test)

我收到错误:

Error in `split<-.default`(`*tmp*`, g, value = lapply(split(x, g), FUN)) : 
replacement has length zero

我知道有一种方法可以修改功能来解决这个问题。在这些情况下,当组大小为1时,我想给出观察值。任何帮助表示赞赏!!

1 个答案:

答案 0 :(得分:3)

根据输入的长度改变函数行为的最简单方法是愉快地调整输入的长度。例如,你可以使用

cumroll <- function(x) {
    if(length(x)<=1) {
        x 
    } else { 
        x <- head(x, -1)
        c(head(x,1), cumsum(x) / seq_along(x))
    }
}

Player <- c('A','B','B','C','C','C','D','D','D','D','E','E','E','E','E')
Team <- c('A','B','B','C','C','C','D','D','D','D','E','E','E','E','E')
Score <- c(5,2,7,3,9,6,3,7,1,7,3,8,3,4,1)

test <- ave(Score, Player, Team, FUN = cumroll)

> data.frame(Player, Team, Score, test)
   Player Team Score     test
1       A    A     5 5.000000
2       B    B     2 2.000000
3       B    B     7 2.000000
4       C    C     3 3.000000
5       C    C     9 3.000000
6       C    C     6 6.000000
7       D    D     3 3.000000
8       D    D     7 3.000000
9       D    D     1 5.000000
10      D    D     7 3.666667
11      E    E     3 3.000000
12      E    E     8 3.000000
13      E    E     3 5.500000
14      E    E     4 4.666667
15      E    E     1 4.500000

但是我对你的方法有点担心......累积均值如何精确定义一个滞后?您可以查看shift中的data.tablerollapply中的zoo,以获得更好的性能和稳健性。