如何在R中计算此变量

时间:2015-08-07 14:06:42

标签: r variables moving-average

我有以下数据:

mydf[77:84,]
   id game_week points  code  web_name first_name second_name position team_name     date fixture team1 team2 home_away team_scored team_conceded minutes goals assists cleansheet goals_conceded own_goals
77  3         1     -2 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 17/08/13 ARS-AVL   ARS   AVL         H           1             3      67     0       0          0              3         0
78  3         2      0 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 24/08/13 FUL-ARS   ARS   FUL         A           3             1       0     0       0          0              0         0
79  3         3      6 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 01/09/13 ARS-TOT   ARS   TOT         H           1             0      90     0       0          1              0         0
80  3         4      2 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 14/09/13 SUN-ARS   ARS   SUN         A           3             1      90     0       0          0              1         0
81  3         5      2 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 22/09/13 ARS-STK   ARS   STK         H           3             1      90     0       0          0              1         0
82  3         6      2 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 28/09/13 SWA-ARS   ARS   SWA         A           2             1      90     0       0          0              1         0
83  3         7      3 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 06/10/13 WBA-ARS   ARS   WBA         A           1             1      90     0       0          0              1         0
84  3         8      2 51507 Koscielny    Laurent   Koscielny Defender   Arsenal 19/10/13 ARS-NOR   ARS   NOR         H           4             1      90     0       0          0              1         0

作为建模练习的一部分,我想创建一个新变量“mov_avg_min”,对于给定的“id”,它是在最后3个“game_week”中播放的“分钟”的平均值。例如,对于web_name“Koscielny”,他的不同“id”在此data_frame中为3。因此,对于id = 3和game_week = 4,函数应该计算game_weeks 1:3的mov_avg_min(对于相同的id值,在当前game_week之前3 game_week)。因此在第80行中,mov_avg_min = 1/3(67 + 0 + 90)= 52.333

1 个答案:

答案 0 :(得分:0)

我认为rollapplyzoo包)与width = 3的{​​{1}}将包含您考虑的行的值。因此,对于游戏4,它将为您提供游戏2,3和4中的平均分钟数。我认为您必须先在lag分钟列中获得基于游戏1,2和3的平均值。请参阅下面的简单示例:

 library(dplyr)
 library(zoo)

 dt = data.frame(id = c(1,1,1,1,1,2,2,2,2,2),
                 games = c(1,2,3,4,5,1,2,3,4,5),
                 minutes = c(61,72,73,82,82,81,71,51,90,73))

 dt

 #   id games minutes
 #       1   1     1      61
 #       2   1     2      72
 #       3   1     3      73
 #       4   1     4      82
 #       5   1     5      82
 #       6   2     1      81
 #       7   2     2      71
 #       8   2     3      51
 #       9   2     4      90
 #       10  2     5      73

 dt %>% group_by(id) %>%
   mutate(lag_minutes = lag(minutes, default=NA)) %>%
   mutate(RA = rollapply(lag_minutes,width=3,mean, align= "right", fill=NA))


 #   Source: local data frame [10 x 5]
 #       Groups: id
 #       
 #          id games minutes lag_minutes       RA
 #       1   1     1      61          NA       NA
 #       2   1     2      72          61       NA
 #       3   1     3      73          72       NA
 #       4   1     4      82          73 68.66667
 #       5   1     5      82          82 75.66667
 #       6   2     1      81          NA       NA
 #       7   2     2      71          81       NA
 #       8   2     3      51          71       NA
 #       9   2     4      90          51 67.66667
 #       10  2     5      73          90 70.66667