使用浮动条件计算累积平均值

时间:2017-11-05 21:51:33

标签: r cumulative-sum

我的数据集具有以下功能:玩家ID,团队,周和点。

我想计算前几周TEAM积分的平均值,但不是过去几周,只是到最后5周或更少(如果当前周小于5)。

示例:对于team = A,week = 7,结果将是team = A和第2,3,4,5和6周的POINTS的平均值。

可以使用以下代码创建数据集:

# set the seed for reproducibility
set.seed(123)
player_id<-c(rep(1,15),rep(2,15),rep(3,15),rep(4,15))
week<-1:15
team<-c(rep("A",30),rep("B",30))
points<-round(runif(60,1,10),0) 
mydata<- data.frame(player_id=player_id,team=team,week=rep(week,4),points)

我希望有一个没有大量循环的解决方案,因为数据集很大。

我在这里做了相关的问题,可能会有所帮助,但我无法适应这种情况。

Question 1

Question 2

谢谢!

2 个答案:

答案 0 :(得分:1)

如果您需要dplyr解决方案,我们会根据my answer to one of your other questions调整方法:

library(dplyr)
library(zoo)
# set the seed for reproducibility
set.seed(123)
player_id<-c(rep(1,15),rep(2,15),rep(3,15),rep(4,15))
week<-1:15
team<-c(rep("A",30),rep("B",30))
points<-round(runif(60,1,10),0) 
mydata<- data.frame(player_id=player_id,team=team,week=rep(week,4),points)

roll_mean <- function(x, k) {
    result <- rollapplyr(x, k, mean, partial=TRUE, na.rm=TRUE)
    result[is.nan(result)] <- NA
    return( result )
}

按团队聚合可能更容易:

team_data <- mydata %>%
    select(-player_id) %>%
    group_by(team, week) %>%
    arrange(week) %>%
    summarise(team_points = sum(points)) %>%
    mutate(rolling_team_mean = roll_mean(lag(team_points), k=5)) %>%
    arrange(team)

team_data

# A tibble: 30 x 4
# Groups:   team [2]
     team  week team_points rolling_team_mean
   <fctr> <int>       <dbl>             <dbl>
 1      A     1          13                NA
 2      A     2          11             13.00
 3      A     3           6             12.00
 4      A     4          13             10.00
 5      A     5          19             10.75
 6      A     6          10             12.40
 7      A     7          13             11.80
 8      A     8          16             12.20
 9      A     9          16             14.20
10      A    10          12             14.80
# ... with 20 more rows

然后,如果你愿意,我们可以将所有东西重新组合在一起:

mydata <- inner_join(mydata, team_data) %>%
    arrange(week, team, player_id)

mydata[1:12, ]

   player_id team week points team_points rolling_team_mean
1          1    A    1      4          13                NA
2          2    A    1      9          13                NA
3          3    B    1     10          12                NA
4          4    B    1      2          12                NA
5          1    A    2      8          11                13
6          2    A    2      3          11                13
7          3    B    2      9          12                12
8          4    B    2      3          12                12
9          1    A    3      5           6                12
10         2    A    3      1           6                12
11         3    B    3      7          12                12
12         4    B    3      5          12                12

答案 1 :(得分:1)

这是一种方式:

# compute points per team per week
pts <- with(mydata, tapply(points, list(team, week), sum, default = 0))
pts
#   1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
#A 13 11  6 13 19 10 13 16 16 12 17 11 13 10  4
#B 12 12 12 11 10  6 13 11  6  9  5  7 13 13  6

# compute the 5-week averages
sapply(setNames(seq(2, ncol(pts)), seq(2, ncol(pts))),
       function(i) {
           apply(pts[, seq(max(1, i - 5), i - 1), drop = FALSE], 1, mean)
       })
#   2  3  4     5    6    7    8    9   10   11   12   13   14   15
#A 13 12 10 10.75 12.4 11.8 12.2 14.2 14.8 13.4 14.8 14.4 13.8 12.6
#B 12 12 12 11.75 11.4 10.2 10.4 10.2  9.2  9.0  8.8  7.6  8.0  9.4

如果week变量有差距,这将导致错误的结果。