滚动百分位数

时间:2016-02-18 22:20:12

标签: r quantile percentile

我的数据集的小代表:

TEAM1 <- c("ATL", "CHI", "CLE", "DET", "GSW", "NOP", "BKN","ATL", "PHI","CHI")
HOME.AWAY <- c("vs.", "vs.", "@", "@", "vs.", "@", "vs.","vs.", "@","@")
TEAM2 <- c("DET", "CLE", "CHI", "ATL", "NOP", "GSW", "CHI","PHI", "ATL","BKN")
DATE <- as.Date(c("2015-05-14", "2015-05-14", "2015-05-14",
       "2015-05-14","2015-05-14", "2015-05-14", "2015-05-15","2015-05-15",
       "2015-05-15","2015-05-15"))
PTS <- c(94, 97, 95, 106, 111, 95, 100,112,87, 94)
df <- data.frame(TEAM1,HOME.AWAY,TEAM2,PTS,DATE)

df

   TEAM1 HOME.AWAY TEAM2 PTS       DATE
   ATL       vs.   DET  94 2015-05-14
   CHI       vs.   CLE  97 2015-05-14
   CLE         @   CHI  95 2015-05-14
   DET         @   ATL 106 2015-05-14
   GSW       vs.   NOP 111 2015-05-14
   NOP         @   GSW  95 2015-05-14
   BKN       vs.   CHI 100 2015-05-15
   ATL       vs.   PHI 112 2015-05-15
   PHI         @   ATL  87 2015-05-15
   CHI       vs.   BKN  94 2015-05-15

数据框按团队级别进行组织。所以每个游戏都会创建两行数据。例如,亚特兰大vs底特律(第一排)和底特律vs亚特兰大(第四排)。然后,数据帧包括TEAM1的分数(PTS,REB,AST ......)。对于这个例子,我只包括Points得分变量。

我已经能够为每个团队创建滚动均值和滚动中值变量:

df<- df[order(df$DATE),]

df <- df %>% group_by(TEAM1) %>% 
  mutate(PTS_YTD = lag(runMean(PTS, n=1, cumulative=TRUE),1))

df <- df %>% group_by(TEAM1) %>% 
  mutate(PTS_YTD = lag(runMedian(PTS, n=1, cumulative=TRUE),1))

我想创建新变量,其中包含有关PTS变量分布的更多信息。例如,我想创建4个新变量:

  • PTS
  • 的百分位数为25%
  • PTS
  • 的百分位数为40%
  • PTS
  • 的百分位数为60%
  • PTS
  • 的百分位数为75%

这些新变量应该像runMean一样滚动。

0 个答案:

没有答案