使用逻辑条件

时间:2017-11-02 20:43:59

标签: r mean

我有一个赛季的足球数据集和一些变量:player_idweekpoints(比赛中每个球员的成绩)。

因此,每个player_id在我的数据集中多次出现。

我的目标是计算每位球员的平均得分,但仅限于前几周。

例如,在player_id=5445week=10的行中,当数据为player_id=5445且周数为1到9时,我想要均值。

我知道我可以为每行过滤数据并计算它。但我希望以更聪明/更快的方式做到这一点......

我想的是:

aggregate(mydata$points, FUN=mean, 
          by=list(player_id=mydata$player_id, week<mydata$week))

但它无效

Thankss !!!

2 个答案:

答案 0 :(得分:1)

这是一个解决方案以及一些示例数据,

football_df <- 
  data.frame(player_id = c(1, 2, 3, 4),
             points = as.integer(runif(40, 0, 10)), 
             week = rep(1:10, each = 4))

获得跑动平均值:

require(dplyr)
football_df %>% 
      group_by(player_id) %>%    # the group to perform the stat on
      arrange(week) %>%          # order the weeks within each group
      mutate(avg = cummean(points) ) %>% # for each week get the cumulative mean
      mutate(avg = lag(avg) ) %>% # shift cumulative mean back one week
      arrange(player_id) # sort by player_id

这里是结果表中的前两个玩家,你可以看到第2周的玩家1,前一周的平均值是7,而在第3周,前一周&# 39;平均值为(9 + 7)/ 2 = 8 ...:

   player_id points week      avg
1          1      7    1       NA
2          1      9    2 7.000000
3          1      9    3 8.000000
4          1      1    4 8.333333
5          1      4    5 6.500000
6          1      8    6 6.000000
7          1      0    7 6.333333
8          1      2    8 5.428571
9          1      5    9 5.000000
10         1      8   10 5.000000
11         2      6    1       NA
12         2      9    2 6.000000
13         2      5    3 7.500000
14         2      1    4 6.666667
15         2      0    5 5.250000
16         2      9    6 4.200000
17         2      8    7 5.000000
18         2      6    8 5.428571
19         2      6    9 5.500000
20         2      8   10 5.555556

答案 1 :(得分:1)

我将使用您的数据,但调用set.seed可以使结果重现。然后我将使用公式界面调用aggregate。请注意,我已将变量week的名称更改为要last_week中使用的subset

set.seed(2550)    # make the results reproducible

player_id <- c(3242,56546,76575,4234,654654,6564,43242,42344,4342,6776,5432,8796,54767)
week <- 1:30
points <- rnorm(390)
mydata <- data.frame(player_id = rep(player_id, 30), 
                     week = rep(week,13),points)

last_week <- 10
agg <- aggregate(points ~ player_id + week, data = subset(mydata, week < last_week), mean)
head(agg)
#  player_id week     points
#1      3242    1 -1.3281831
#2      4234    1  0.3578657
#3      4342    1 -0.8267423
#4      5432    1 -0.4245487
#5      6564    1 -0.2968879
#6      6776    1  0.8348178