所以我从EPL获得了所有这些数据,我现在正在尝试根据他们的最后五场比赛为球队“形式”创建一个专栏。胜利计数为1点,平局为0.5,损失为零。我有一个循环,一次为一个团队做这个,但是当我尝试创建一个将它们合并在一起时,由于某种原因,我无法让它工作。我的数据来自:http://www.football-data.co.uk/englandm.php
为了使其更简单,我将仅使用2013/2014赛季英超联赛的数据。
我从Excel工作表导入数据并将其标记为PL_20
library(RCurl)
URL <- "www.football-data.co.uk/mmz4281/1415/E0.csv"
x <- getURL(URL, ssl.verifypeer = FALSE)
PL_20 <- read.csv(textConnection(x))
# cleaning data, getting rid of the betting odds
data <- PL_20[,-c(1,10,11,24:65)]
#Get dates all in same date format
data$Date<-as.Date(data$Date, guess_formats(data$Date, "dmy"))
#sorting (used for all the seasons)
sorted <- data[order(data$Date,decreasing=TRUE),]
sorted$index <- seq(1:nrow(sorted))
# making a column for form
teams<-as.matrix(unique(data$HomeTeam))
test<-sorted
z<-1 # controls which team the form is being found for. Ideally I would have this cycle
# through all of the teams
current<-subset(sorted, HomeTeam==as.character(teams[z]) | AwayTeam==as.character(teams[z]))
current$h.form<-0
current$a.form<-0
current$recent<-0
for (i in 1:nrow(current)){
if((as.character(current[i,2])==as.character(teams[z]) && as.character(current[i,6])=="H") || (as.character(current[i,3])==as.character(teams[z]) && as.character(current[i,6])=="A")){
# current[i,7]<- "W"
current[i,24]<- 1
}else{
if((as.character(current[i,2])==as.character(teams[z]) && as.character(current[i,6])=="D") || (as.character(current[i,3])==as.character(teams[z]) && as.character(current[i,6])=="D"))
{
#current[i,7]<- "D"
current[i,24]<- .5
}else{
if((as.character(current[i,2])==as.character(teams[z]) && as.character(current[i,6])=="A") || (as.character(current[i,3])==as.character(teams[z]) && as.character(current[i,6])=="H"))
{
# current[i,7]<- "L"
current[i,24]<- 0
}
}
}
}
d<-0
for (d in 0:(nrow(current)-6))
{
if (as.character(current[nrow(current)-(5+d),2])==as.character(teams[z])){
current[(nrow(current)-(5+d)),22]<-as.numeric(sum(current[(nrow(current)-(4+d)):(nrow(current)-d),24]))
}else{
if(as.character(current[nrow(current)-(5+d),3])==as.character(teams[z]))
{
current[(nrow(current)-(5+d)),23]<-sum(current[(nrow(current)-(4+d)):(nrow(current)-d),24])
}
}
}
现在这些丑陋的循环创建了一个名为current的数据框,最后有三列:h.form,a.form和recent。 h.form是该游戏的指定主队的形式,a.form是该游戏的指定客队的形式。最近只是那场比赛的结果。
我希望能够将所有团队组合在一起,因此每场比赛都有一个观察点,并且h.form和a.form都填充了相应团队的正确值。
如果您有关于如何清理这些循环的建议,那么您的帮助也会受到赞赏。