在唯一标识符上匹配不相等的数据帧

时间:2014-11-26 20:02:47

标签: r loops merge matching

所以我从EPL获得了所有这些数据,我现在正在尝试根据他们的最后五场比赛为球队“形式”创建一个专栏。胜利计数为1点,平局为0.5,损失为零。我有一个循环,一次为一个团队做这个,但是当我尝试创建一个将它们合并在一起时,由于某种原因,我无法让它工作。我的数据来自:http://www.football-data.co.uk/englandm.php

为了使其更简单,我将仅使用2013/2014赛季英超联赛的数据。

我从Excel工作表导入数据并将其标记为PL_20

library(RCurl)
URL <- "www.football-data.co.uk/mmz4281/1415/E0.csv"
x <- getURL(URL, ssl.verifypeer = FALSE)
PL_20 <- read.csv(textConnection(x))

# cleaning data, getting rid of the betting odds
data <- PL_20[,-c(1,10,11,24:65)]

#Get dates all in same date format
data$Date<-as.Date(data$Date, guess_formats(data$Date, "dmy"))

#sorting (used for all the seasons)
sorted <- data[order(data$Date,decreasing=TRUE),]
sorted$index <- seq(1:nrow(sorted))

# making a column for form
teams<-as.matrix(unique(data$HomeTeam))


test<-sorted
z<-1    # controls which team the form is being found for. Ideally I would have this cycle 
        # through all of the teams

      current<-subset(sorted, HomeTeam==as.character(teams[z]) | AwayTeam==as.character(teams[z]))
      current$h.form<-0
      current$a.form<-0
      current$recent<-0

      for (i in 1:nrow(current)){
           if((as.character(current[i,2])==as.character(teams[z]) && as.character(current[i,6])=="H") || (as.character(current[i,3])==as.character(teams[z]) && as.character(current[i,6])=="A")){
                # current[i,7]<- "W"
                current[i,24]<- 1
           }else{ 
                if((as.character(current[i,2])==as.character(teams[z]) && as.character(current[i,6])=="D") || (as.character(current[i,3])==as.character(teams[z]) && as.character(current[i,6])=="D"))
                {
                     #current[i,7]<- "D"
                     current[i,24]<- .5
                }else{ 
                     if((as.character(current[i,2])==as.character(teams[z]) && as.character(current[i,6])=="A") || (as.character(current[i,3])==as.character(teams[z]) && as.character(current[i,6])=="H"))
                     {

                          # current[i,7]<- "L"
                          current[i,24]<- 0
                     }

                }
           }
      }

      d<-0

      for (d in 0:(nrow(current)-6))
           {
                if (as.character(current[nrow(current)-(5+d),2])==as.character(teams[z])){
                current[(nrow(current)-(5+d)),22]<-as.numeric(sum(current[(nrow(current)-(4+d)):(nrow(current)-d),24]))
                }else{
                     if(as.character(current[nrow(current)-(5+d),3])==as.character(teams[z]))
                          {
                          current[(nrow(current)-(5+d)),23]<-sum(current[(nrow(current)-(4+d)):(nrow(current)-d),24])  
                     }
                }
      }

现在这些丑陋的循环创建了一个名为current的数据框,最后有三列:h.form,a.form和recent。 h.form是该游戏的指定主队的形式,a.form是该游戏的指定客队的形式。最近只是那场比赛的结果。

我希望能够将所有团队组合在一起,因此每场比赛都有一个观察点,并且h.form和a.form都填充了相应团队的正确值。

如果您有关于如何清理这些循环的建议,那么您的帮助也会受到赞赏。

0 个答案:

没有答案