我的数据集具有以下功能:玩家ID,周和点。
我想计算前几周的积分平均值,但不是过去几周的平均值,只计算最后5周或更少(如果当前周数小于5)。
示例:对于player_id = 5,周= 7,结果将是player_id = 5和第2,3,4,5和6周的POINTS的平均值。
以下代码已经完成了前一周的平均值,因此我需要进行调整以使其仅在前一周进行。
player_id<-c(rep(1,30),rep(2,30),rep(3,30),rep(4,30),rep(5,30))
week<-1:30
points<-round(runif(150,1,10),0)
mydata<- data.frame(player_id=player_id,week=rep(week,5),points)
mydata<-mydata %>%
group_by(player_id) %>% # the group to perform the stat on
arrange(week) %>% # order the weeks within each group
mutate(previous_mean = cummean(points) ) %>% # for each week get the
cumulative mean
mutate(previous_mean = lag(previous_mean) ) %>% # shift cumulative
mean back one week
arrange(player_id) # sort by player_id
答案 0 :(得分:1)
HAVB的方法很棒,但根据你想要的,这是另一个。此方法从this answer改为不同的问题,但根据您的具体情况进行了更改:
library(dplyr)
library(zoo)
# set the seed for reproducibility
set.seed(123)
player_id<-c(rep(1,30),rep(2,30),rep(3,30),rep(4,30),rep(5,30))
week<-1:30
points<-round(runif(150,1,10),0)
mydata<- data.frame(player_id=player_id,week=rep(week,5),points)
roll_mean <- function(x, k) {
result <- rollapplyr(x, k, mean, partial=TRUE, na.rm=TRUE)
result[is.nan(result)] <- NA
return( result )
}
mydata<- data.frame(player_id=player_id,week=rep(week,5),points)
mydata<-mydata %>%
group_by(player_id) %>%
arrange(week) %>%
mutate(rolling_mean = roll_mean(x=lag(points), k=5) ) %>%
arrange(player_id)
然后我们可以查看一个子集来证明它有效:
mydata[mydata$player_id %in% 1:2 & mydata$week %in% 1:6, ]
# A tibble: 12 x 4
# Groups: player_id [2]
player_id week points rolling_mean
<dbl> <int> <dbl> <dbl>
1 1 1 4 NA
2 1 2 8 4.000000
3 1 3 5 6.000000
4 1 4 9 5.666667
5 1 5 9 6.500000
6 1 6 1 7.000000
7 2 1 10 NA
8 2 2 9 10.000000
9 2 3 7 9.500000
10 2 4 8 8.666667
11 2 5 1 8.500000
12 2 6 5 7.000000
所以我们每次都可以看到 t ,{em> 的rolling_mean
将是玩家points观察值的平均值>我有时{ t - 1,...,min(1, t - 5)}。
答案 1 :(得分:0)
您可以使用slice
为每个组选择最近5周。试试这个:
player_id<-c(rep(1,30),rep(2,30),rep(3,30),rep(4,30),rep(5,30))
week<-1:30
points<-round(runif(150,1,10),0)
mydata<- data.frame(player_id=player_id,week=rep(week,5),points)
library(dplyr)
mydata <- mydata %>%
group_by(player_id) %>% # the group to perform the stat on
arrange(week) %>% # order the weeks within each group
slice( (n()-4):n() ) %>% # "slice" the last 5 rows (weeks) of every group
mutate(previous_mean = cummean(points) ) %>% # for each week get the cumulative mean
mutate(previous_mean = lag(previous_mean) ) %>% # shift cumulative mean back one week
arrange(player_id) # sort by player_id
该行
slice( (n()-4):n() )
为每个组
选择[(最后一行 - 4):最后一行]范围内的行编辑:为避免当前周数小于5时出现问题,请使用ifelse
语句进行验证:
mydata %>%
group_by(player_id) %>% # the group to perform the stat on
arrange(week) %>% # order the weeks within each group
slice(ifelse(n() < 5, 1:n(), n()-4):n()) %>% # "slice" the last 5 rows (weeks) of every group
mutate(previous_mean = cummean(points) ) %>% # for each week get the cumulative mean
mutate(previous_mean = lag(previous_mean) ) %>% # shift cumulative mean back one week
arrange(player_id) # sort by player_id