DT:
HomeTeam AwayTeam Season Htpoints Atpoints
Mattersburg Salzburg 2015/2016 3 0
Salzburg Rapid Vienna 2015/2016 0 3
Admira Mattersburg 2015/2016 3 0
Admira Salzburg 2015/2016 1 1
Mattersburg Ried 2015/2016 3 0
Ried Salzburg 2015/2016 0 3
Altach Mattersburg 2015/2016 3 0
Austria Vie Mattersburg 2015/2016 3 0
Salzburg Altach 2015/2016 3 0
Mattersburg AC Wolfsberger2015/2016 3 0
Salzburg Austria Vienna2015/2016 1 1
Rapid Vienna Mattersburg 2015/2016 0 3
Sturm Graz Salzburg 2015/2016 0 3
Salzburg Grodig 2015/2016 3 0
要计算一支球队在主场最近三场比赛中的平均得分:
library(zoo)
roll <- function(x, n) {
if (length(x) <= n) NaN
else rollapply(x, list(-seq(n)), mean, fill = NaN)
}
transform(DT, last3.HT.av.points = ave(Htpoints,Season,HomeTeam, FUN = function(x) roll(x, 3)))
以上不是问题。另一方面...
是否有可能计算最近3场比赛的平均得分,而不管球队是在家还是在场上?
所需的输出(仅显示萨尔茨堡团队的信息):
HomeTeam AwayTeam Season Htpoints Atpoints HT.av.last3 AT.av.last3
Mattersburg Salzburg 2015/2016 3 0 NA
Salzburg Rapid Vienna 2015/2016 0 3 NA
Admira Mattersburg 2015/2016 3 0
Admira Salzburg 2015/2016 1 1 NA
Mattersburg Ried 2015/2016 3 0
Ried Salzburg 2015/2016 0 3 0.33
Altach Mattersburg 2015/2016 3 0
Austria Vie Mattersburg 2015/2016 3 0
Salzburg Altach 2015/2016 3 0 1.33
Mattersburg AC Wolfsberger2015/2016 3 0
Salzburg Austria Vienna2015/2016 1 1 2.33
Rapid Vienna Mattersburg 2015/2016 0 3
Sturm Graz Salzburg 2015/2016 0 3 2.33
Salzburg Grodig 2015/2016 3 0 2.33
首选项: data.table
可复制的数据集(与上面的数据集不同):
library(data.table)
DT <- fread("HomeTeam,AwayTeam,Season,Htpoints,Atpoints
Grodig,Salzburg,2015/2016,0,3
Rapid Vienna,Altach,2015/2016,1,1
Ried,Austria Vienna,2015/2016,3,0
Sturm Graz,Mattersburg,2015/2016,3,0
Admira,Rapid Vienna,2015/2016,1,1
Altach,Ried,2015/2016,0,3
Austria Vienna,Sturm Graz,2015/2016,1,1
Mattersburg,Grodig,2015/2016,3,0
Salzburg,AC Wolfsberger,2015/2016,3,0")
numTeams <- DT[,uniqueN(c(HomeTeam, AwayTeam))]
firstHalf <- lapply(seq_len(DT[,.N]),
function(n) data.table(
Matchday=n*2L-1L,
HomeTeam=DT[["HomeTeam"]],
AwayTeam=c(DT[["AwayTeam"]][-seq_len(n)], DT[["AwayTeam"]][seq_len(n)]),
Season=DT[["Season"]],
Htpoints=DT[["Htpoints"]],
Atpoints=DT[["Atpoints"]]
))
secondHalf <- lapply(seq_len(DT[,.N]),
function(n) data.table(
Matchday=n*2L,
HomeTeam=DT[["AwayTeam"]],
AwayTeam=c(DT[["HomeTeam"]][-seq_len(n)], DT[["HomeTeam"]][seq_len(n)]),
Season=DT[["Season"]],
Htpoints=DT[["Htpoints"]],
Atpoints=DT[["Atpoints"]]
))
DT <- rbindlist(c(firstHalf, secondHalf))[
HomeTeam!=AwayTeam][,
.SD[1L], by=.(HomeTeam, AwayTeam)]
setorder(DT, Matchday, HomeTeam)
DT <- DT[,-c("Matchday")]
答案 0 :(得分:1)
library(tidyverse)
library(zoo)
DT_prep <- DT %>%
as.tibble() %>%
mutate(row = row_number())
DT_rollmeans <- DT_prep %>%
gather(teamside, teamname, -Season, -Htpoints, -Atpoints, -row) %>%
arrange(row) %>%
group_by(teamname) %>%
mutate(points = case_when(teamside == 'HomeTeam' ~ Htpoints,
teamside == 'AwayTeam' ~ Atpoints),
roll_mean = zoo::rollapply(points, 3, mean, align = 'right', fill = NA)) %>%
ungroup() %>%
select(row, teamside, roll_mean) %>%
spread(teamside, roll_mean) %>%
select(row, HT.av.last3 = HomeTeam, AT.av.last3 = AwayTeam)
DT_prep %>% left_join(DT_rollmeans) %>% select(-row)
这会产生如下所示的小标题:
# A tibble: 90 x 7
HomeTeam AwayTeam Season Htpoints Atpoints HT.av.last3 AT.av.last3
<chr> <chr> <chr> <int> <int> <dbl> <dbl>
1 Admira Ried 2015/2016 1 1 NA NA
2 Altach Sturm Graz 2015/2016 0 3 NA NA
3 Austria Vienna Grodig 2015/2016 1 1 NA NA
4 Grodig Altach 2015/2016 0 3 NA NA
5 Mattersburg AC Wolfsberger 2015/2016 3 0 NA NA
6 Rapid Vienna Austria Vienna 2015/2016 1 1 NA NA
7 Ried Mattersburg 2015/2016 3 0 NA NA
8 Sturm Graz Rapid Vienna 2015/2016 3 0 NA NA
9 AC Wolfsberger Grodig 2015/2016 3 0 NA 0.333
10 Mattersburg Admira 2015/2016 3 0 2 NA
# ... with 80 more rows
对于每个人来说,前2场比赛的平均得分为NA,此后为最后3场比赛的滚动平均值。至少拥有三场比赛的第一支球队数据是Grodig,并且在前三场比赛的得分1、0和0中具有0.333滚动平均值。
我对我的解决方案不满意,但是它可以工作,我敢肯定有人可以使它变得更紧凑。
答案 1 :(得分:1)
使用在结尾处的注释中可重复显示的DT
,添加一个行号列i
,
并创建一个数据表both
,其中DT
中的每一行都有两行,
主队和一支客队。然后在其上使用rollapply
并将结果插入回DT
中。请注意,由于rollapply
会自动处理团队中的前一行少于3行的情况,因此无需特殊代码即可处理。
both <- rbind(
DT[, list(HomeAway = "Home", Team = HomeTeam, Season, Points = Htpoints, i = .I)],
DT[, list(HomeAway = "Away", Team = AwayTeam, Season, Points = Atpoints, i = .I)]
)
setkeyv(both, c("Season", "Team", "i"))
both[, Last3 := rollapply(Points, list(-seq(3)), mean, fill = NA_real_, na.rm = TRUE),
by = "Season,Team"]
setkeyv(both, "i")
DT[, HtLast3 := both[HomeAway == "Home", Last3]][
, AtLast3 := both[HomeAway == "Away", Last3]]
给予:
> DT
HomeTeam AwayTeam Season Htpoints Atpoints HtLast3 AtLast3
1: Mattersburg Salzburg 2015/2016 3 0 NA NA
2: Salzburg Rapid Vienna 2015/2016 0 3 NA NA
3: Admira Mattersburg 2015/2016 3 0 NA NA
4: Admira Salzburg 2015/2016 1 1 NA NA
5: Mattersburg Ried 2015/2016 3 0 NA NA
6: Ried Salzburg 2015/2016 0 3 NA 0.3333333
7: Altach Mattersburg 2015/2016 3 0 NA 2.0000000
8: Austria Vie Mattersburg 2015/2016 3 0 NA 1.0000000
9: Salzburg Altach 2015/2016 3 0 1.333333 NA
10: Mattersburg AC Wolfsberger 2015/2016 3 0 1.000000 NA
11: Salzburg Austria Vienna 2015/2016 1 1 2.333333 NA
12: Rapid Vienna Mattersburg 2015/2016 0 3 NA 1.0000000
13: Sturm Graz Salzburg 2015/2016 0 3 NA 2.3333333
14: Salzburg Grodig 2015/2016 3 0 2.333333 NA
DF <-
structure(list(HomeTeam = c("Mattersburg", "Salzburg", "Admira",
"Admira", "Mattersburg", "Ried", "Altach", "Austria Vie", "Salzburg",
"Mattersburg", "Salzburg", "Rapid Vienna", "Sturm Graz", "Salzburg"
), AwayTeam = c("Salzburg", "Rapid Vienna", "Mattersburg", "Salzburg",
"Ried", "Salzburg", "Mattersburg", "Mattersburg", "Altach", "AC Wolfsberger",
"Austria Vienna", "Mattersburg", "Salzburg", "Grodig"), Season = c("2015/2016",
"2015/2016", "2015/2016", "2015/2016", "2015/2016", "2015/2016",
"2015/2016", "2015/2016", "2015/2016", "2015/2016", "2015/2016",
"2015/2016", "2015/2016", "2015/2016"), Htpoints = c(3L, 0L,
3L, 1L, 3L, 0L, 3L, 3L, 3L, 3L, 1L, 0L, 0L, 3L), Atpoints = c(0L,
3L, 0L, 1L, 0L, 3L, 0L, 0L, 0L, 0L, 1L, 3L, 3L, 0L)),
class = "data.frame", row.names = c(NA, -14L))
DT <- as.data.table(DF)
答案 2 :(得分:0)
我很难处理您的数据集,所以我制作了自己的数据集,就像您的数据集一样:
Home= sample(c("A","B","C","D"),9,replace = T)
Away= sample(c("A","B","C","D"),9,replace = T)
Home_Points= sample(c(0,1,3),9,replace = T)
Away_Points= sample(c(0,1,3),9,replace = T)
dt<-data.frame(HomeTeam=Home,
AwayTeam=Away,
Htpoints=Home_Points,Atpoints=Away_Points,
stringsAsFactors = FALSE)
我的数据集是:
HomeTeam AwayTeam Htpoints Atpoints
1 C C 0 1
2 D B 1 1
3 D B 3 0
4 A B 0 3
5 C D 1 3
6 C A 1 3
7 C D 1 1
8 D A 1 3
9 D B 3 3
解决方案: 结合主队,客队和他们的得分
team <- as.vector(rbind(dt[,1],dt[,2]))
points<- as.vector(rbind(dt[,3],dt[,4]))
newDT<-data.frame( team=team,points=points,stringsAsFactors = FALSE)
最后根据团队,无论是在主场还是不在,都对积分进行求和:
图书馆(tidyverse)
newDT %>%
group_by(team) %>%
summarise_all(sum)
结果是:
team points
<chr> <dbl>
1 A 6
2 B 7
3 C 4
4 D 12
点
如果您认为季节也可能会改变,则也可以将季节添加到新数据集中,然后根据它进行排序(arrange
)。