最后3行的R平均值(不同列中的值)按两列分组

时间:2018-09-08 16:33:36

标签: r

DT:

HomeTeam       AwayTeam       Season      Htpoints  Atpoints
Mattersburg    Salzburg      2015/2016        3         0
Salzburg       Rapid Vienna  2015/2016        0         3
Admira         Mattersburg   2015/2016        3         0
Admira         Salzburg      2015/2016        1         1
Mattersburg    Ried          2015/2016        3         0
Ried           Salzburg      2015/2016        0         3
Altach         Mattersburg   2015/2016        3         0
Austria Vie    Mattersburg   2015/2016        3         0
Salzburg       Altach        2015/2016        3         0
Mattersburg    AC Wolfsberger2015/2016        3         0
Salzburg       Austria Vienna2015/2016        1         1
Rapid Vienna   Mattersburg   2015/2016        0         3
Sturm Graz     Salzburg      2015/2016        0         3
Salzburg       Grodig        2015/2016        3         0

要计算一支球队在主场最近三场比赛中的平均得分:

library(zoo)

roll <- function(x, n) { 
if (length(x) <= n) NaN 
else rollapply(x, list(-seq(n)), mean, fill = NaN)
}

transform(DT, last3.HT.av.points = ave(Htpoints,Season,HomeTeam, FUN = function(x) roll(x, 3)))

以上不是问题。另一方面...

是否有可能计算最近3场比赛的平均得分,而不管球队是在家还是在场上?

所需的输出(仅显示萨尔茨堡团队的信息):

HomeTeam       AwayTeam       Season      Htpoints  Atpoints   HT.av.last3  AT.av.last3
Mattersburg    Salzburg      2015/2016        3         0                        NA
Salzburg       Rapid Vienna  2015/2016        0         3           NA
Admira         Mattersburg   2015/2016        3         0
Admira         Salzburg      2015/2016        1         1                        NA
Mattersburg    Ried          2015/2016        3         0
Ried           Salzburg      2015/2016        0         3                        0.33
Altach         Mattersburg   2015/2016        3         0
Austria Vie    Mattersburg   2015/2016        3         0
Salzburg       Altach        2015/2016        3         0          1.33
Mattersburg    AC Wolfsberger2015/2016        3         0
Salzburg       Austria Vienna2015/2016        1         1          2.33
Rapid Vienna   Mattersburg   2015/2016        0         3
Sturm Graz     Salzburg      2015/2016        0         3                        2.33
Salzburg       Grodig        2015/2016        3         0          2.33

首选项: data.table

可复制的数据集(与上面的数据集不同):

 library(data.table)
 DT <- fread("HomeTeam,AwayTeam,Season,Htpoints,Atpoints
        Grodig,Salzburg,2015/2016,0,3
        Rapid Vienna,Altach,2015/2016,1,1
        Ried,Austria Vienna,2015/2016,3,0
        Sturm Graz,Mattersburg,2015/2016,3,0
        Admira,Rapid Vienna,2015/2016,1,1
        Altach,Ried,2015/2016,0,3
        Austria Vienna,Sturm Graz,2015/2016,1,1
        Mattersburg,Grodig,2015/2016,3,0
        Salzburg,AC Wolfsberger,2015/2016,3,0")

 numTeams <- DT[,uniqueN(c(HomeTeam, AwayTeam))]

 firstHalf <- lapply(seq_len(DT[,.N]),
                function(n) data.table(
                  Matchday=n*2L-1L,
                  HomeTeam=DT[["HomeTeam"]],
                  AwayTeam=c(DT[["AwayTeam"]][-seq_len(n)], DT[["AwayTeam"]][seq_len(n)]),
                  Season=DT[["Season"]],
                  Htpoints=DT[["Htpoints"]],
                  Atpoints=DT[["Atpoints"]]
                ))

 secondHalf <- lapply(seq_len(DT[,.N]),
                 function(n) data.table(
                   Matchday=n*2L,
                   HomeTeam=DT[["AwayTeam"]],
                   AwayTeam=c(DT[["HomeTeam"]][-seq_len(n)], DT[["HomeTeam"]][seq_len(n)]),
                   Season=DT[["Season"]],
                   Htpoints=DT[["Htpoints"]],
                   Atpoints=DT[["Atpoints"]]
                 ))


DT <- rbindlist(c(firstHalf, secondHalf))[
HomeTeam!=AwayTeam][,
            .SD[1L], by=.(HomeTeam, AwayTeam)]
setorder(DT, Matchday, HomeTeam)
DT <- DT[,-c("Matchday")]

3 个答案:

答案 0 :(得分:1)

library(tidyverse)
library(zoo)

DT_prep <- DT %>% 
  as.tibble() %>% 
  mutate(row = row_number()) 

DT_rollmeans <- DT_prep %>% 
  gather(teamside, teamname, -Season, -Htpoints, -Atpoints, -row) %>% 
  arrange(row) %>% 
  group_by(teamname) %>% 
  mutate(points = case_when(teamside == 'HomeTeam' ~ Htpoints,
                            teamside == 'AwayTeam' ~ Atpoints),
         roll_mean = zoo::rollapply(points, 3, mean, align = 'right', fill = NA)) %>% 
  ungroup() %>% 
  select(row, teamside, roll_mean) %>%
  spread(teamside, roll_mean) %>% 
  select(row, HT.av.last3 = HomeTeam, AT.av.last3 = AwayTeam)



DT_prep %>% left_join(DT_rollmeans) %>% select(-row)

这会产生如下所示的小标题:

# A tibble: 90 x 7
   HomeTeam       AwayTeam       Season    Htpoints Atpoints HT.av.last3 AT.av.last3
   <chr>          <chr>          <chr>        <int>    <int>       <dbl>       <dbl>
 1 Admira         Ried           2015/2016        1        1          NA      NA    
 2 Altach         Sturm Graz     2015/2016        0        3          NA      NA    
 3 Austria Vienna Grodig         2015/2016        1        1          NA      NA    
 4 Grodig         Altach         2015/2016        0        3          NA      NA    
 5 Mattersburg    AC Wolfsberger 2015/2016        3        0          NA      NA    
 6 Rapid Vienna   Austria Vienna 2015/2016        1        1          NA      NA    
 7 Ried           Mattersburg    2015/2016        3        0          NA      NA    
 8 Sturm Graz     Rapid Vienna   2015/2016        3        0          NA      NA    
 9 AC Wolfsberger Grodig         2015/2016        3        0          NA       0.333
10 Mattersburg    Admira         2015/2016        3        0           2      NA    
# ... with 80 more rows

对于每个人来说,前2场比赛的平均得分为NA,此后为最后3场比赛的滚动平均值。至少拥有三场比赛的第一支球队数据是Grodig,并且在前三场比赛的得分1、0和0中具有0.333滚动平均值。

我对我的解决方案不满意,但是它可以工作,我敢肯定有人可以使它变得更紧凑。

答案 1 :(得分:1)

使用在结尾处的注释中可重复显示的DT,添加一个行号列i, 并创建一个数据表both,其中DT中的每一行都有两行, 主队和一支客队。然后在其上使用rollapply并将结果插入回DT中。请注意,由于rollapply会自动处理团队中的前一行少于3行的情况,因此无需特殊代码即可处理。

both <- rbind(
  DT[, list(HomeAway = "Home", Team = HomeTeam, Season, Points = Htpoints, i = .I)],
  DT[, list(HomeAway = "Away", Team = AwayTeam, Season, Points = Atpoints, i = .I)]
)

setkeyv(both, c("Season", "Team", "i"))
both[, Last3 := rollapply(Points, list(-seq(3)), mean, fill = NA_real_, na.rm = TRUE),
  by = "Season,Team"]

setkeyv(both, "i")
DT[, HtLast3 := both[HomeAway == "Home", Last3]][
   , AtLast3 := both[HomeAway == "Away", Last3]]

给予:

> DT
        HomeTeam       AwayTeam    Season Htpoints Atpoints  HtLast3   AtLast3
 1:  Mattersburg       Salzburg 2015/2016        3        0       NA        NA
 2:     Salzburg   Rapid Vienna 2015/2016        0        3       NA        NA
 3:       Admira    Mattersburg 2015/2016        3        0       NA        NA
 4:       Admira       Salzburg 2015/2016        1        1       NA        NA
 5:  Mattersburg           Ried 2015/2016        3        0       NA        NA
 6:         Ried       Salzburg 2015/2016        0        3       NA 0.3333333
 7:       Altach    Mattersburg 2015/2016        3        0       NA 2.0000000
 8:  Austria Vie    Mattersburg 2015/2016        3        0       NA 1.0000000
 9:     Salzburg         Altach 2015/2016        3        0 1.333333        NA
10:  Mattersburg AC Wolfsberger 2015/2016        3        0 1.000000        NA
11:     Salzburg Austria Vienna 2015/2016        1        1 2.333333        NA
12: Rapid Vienna    Mattersburg 2015/2016        0        3       NA 1.0000000
13:   Sturm Graz       Salzburg 2015/2016        0        3       NA 2.3333333
14:     Salzburg         Grodig 2015/2016        3        0 2.333333        NA

注意

DF <-
structure(list(HomeTeam = c("Mattersburg", "Salzburg", "Admira", 
"Admira", "Mattersburg", "Ried", "Altach", "Austria Vie", "Salzburg", 
"Mattersburg", "Salzburg", "Rapid Vienna", "Sturm Graz", "Salzburg"
), AwayTeam = c("Salzburg", "Rapid Vienna", "Mattersburg", "Salzburg", 
"Ried", "Salzburg", "Mattersburg", "Mattersburg", "Altach", "AC Wolfsberger", 
"Austria Vienna", "Mattersburg", "Salzburg", "Grodig"), Season = c("2015/2016", 
"2015/2016", "2015/2016", "2015/2016", "2015/2016", "2015/2016", 
"2015/2016", "2015/2016", "2015/2016", "2015/2016", "2015/2016", 
"2015/2016", "2015/2016", "2015/2016"), Htpoints = c(3L, 0L, 
3L, 1L, 3L, 0L, 3L, 3L, 3L, 3L, 1L, 0L, 0L, 3L), Atpoints = c(0L, 
3L, 0L, 1L, 0L, 3L, 0L, 0L, 0L, 0L, 1L, 3L, 3L, 0L)), 
class = "data.frame", row.names = c(NA, -14L))

DT <- as.data.table(DF)

答案 2 :(得分:0)

我很难处理您的数据集,所以我制作了自己的数据集,就像您的数据集一样:

Home= sample(c("A","B","C","D"),9,replace = T)
Away= sample(c("A","B","C","D"),9,replace = T)
Home_Points= sample(c(0,1,3),9,replace = T)
Away_Points= sample(c(0,1,3),9,replace = T)

dt<-data.frame(HomeTeam=Home,
               AwayTeam=Away, 
               Htpoints=Home_Points,Atpoints=Away_Points,
               stringsAsFactors = FALSE)

我的数据集是:

  HomeTeam AwayTeam Htpoints Atpoints
1        C        C        0        1
2        D        B        1        1
3        D        B        3        0
4        A        B        0        3
5        C        D        1        3
6        C        A        1        3
7        C        D        1        1
8        D        A        1        3
9        D        B        3        3

解决方案: 结合主队,客队和他们的得分

team  <- as.vector(rbind(dt[,1],dt[,2]))
points<- as.vector(rbind(dt[,3],dt[,4]))

newDT<-data.frame( team=team,points=points,stringsAsFactors = FALSE)

最后根据团队,无论是在主场还是不在,都对积分进行求和:

图书馆(tidyverse)

newDT %>%
  group_by(team) %>%
  summarise_all(sum) 

结果是:

 team  points
  <chr>  <dbl>
1 A          6
2 B          7
3 C          4
4 D         12

如果您认为季节也可能会改变,则也可以将季节添加到新数据集中,然后根据它进行排序(arrange)。