我的数据集的小代表:
TEAM1 <- c("ATL", "CHI", "CLE", "DET", "GSW", "NOP", "BKN","ATL", "PHI","CHI")
HOME.AWAY <- c("vs.", "vs.", "@", "@", "vs.", "@", "vs.","vs.", "@","@")
TEAM2 <- c("DET", "CLE", "CHI", "ATL", "NOP", "GSW", "CHI","PHI", "ATL","BKN")
DATE <- as.Date(c("2015-05-14", "2015-05-14", "2015-05-14",
"2015-05-14","2015-05-14", "2015-05-14", "2015-05-15","2015-05-15",
"2015-05-15","2015-05-15"))
PTS <- c(94, 97, 95, 106, 111, 95, 100,112,87, 94)
df <- data.frame(TEAM1,HOME.AWAY,TEAM2,PTS,DATE)
df
TEAM1 HOME.AWAY TEAM2 PTS DATE
ATL vs. DET 94 2015-05-14
CHI vs. CLE 97 2015-05-14
CLE @ CHI 95 2015-05-14
DET @ ATL 106 2015-05-14
GSW vs. NOP 111 2015-05-14
NOP @ GSW 95 2015-05-14
BKN vs. CHI 100 2015-05-15
ATL vs. PHI 112 2015-05-15
PHI @ ATL 87 2015-05-15
CHI vs. BKN 94 2015-05-15
数据框按团队级别进行组织。所以每个游戏都会创建两行数据。例如,亚特兰大vs底特律(第一排)和底特律vs亚特兰大(第四排)。然后,数据帧包括TEAM1的分数(PTS,REB,AST ......)。对于这个例子,我只包括Points得分变量。
我已经能够为每个团队创建滚动均值和滚动中值变量:
df<- df[order(df$DATE),]
df <- df %>% group_by(TEAM1) %>%
mutate(PTS_YTD = lag(runMean(PTS, n=1, cumulative=TRUE),1))
df <- df %>% group_by(TEAM1) %>%
mutate(PTS_YTD = lag(runMedian(PTS, n=1, cumulative=TRUE),1))
我想创建新变量,其中包含有关PTS变量分布的更多信息。例如,我想创建4个新变量:
这些新变量应该像runMean一样滚动。