在R Studio中,如何阻止我的for循环用最新的输出覆盖存储的输出?

时间:2019-04-12 14:51:07

标签: r web-scraping

我构建了一个for循环,该循环应将足球运动员的名字和姓氏(从网站上刮下来)存储在单独的列中,但是for循环会用最新的输出覆盖所有存储的结果。

Set value

使用3个玩家ID的测试向量来找到第一个玩家和姓氏,它会连续存储三次第三个玩家的名称。 (拉姆的ID = 14)

noplayers <- 3 # the amount of players I want to run the loop for while testing my code
playeridtest <- playerid[1:noplayers] # assign the three IDs to a vector
playernames <- rep(NA, noplayers) 
playernames <- as.data.frame(playernames) # Create an empty data frame to store results in
playernames$id <- playeridtest # Add the three player IDs to the ID column

for(i in playeridtest){
  scoresway <- paste("http://www.scoresway.com?sport=soccer&page=person&id=",i, sep="")
  scoresway <- read_html(scoresway)
  urlnodescorefirst <- html_node(scoresway, "dd:nth-child(2)")
  urltextscorefirst <- html_text(urlnodescorefirst)
  playernames$first <- urltextscorefirst
  urlnodescoresur <- html_node(scoresway, "dd:nth-child(4)")
  urltextscoresur <- html_text(urlnodescoresur)
  playernames$sur <- urltextscoresur
}

1 个答案:

答案 0 :(得分:1)

for(i in seq_along(playeridtest)) { # Note change here
  scoresway <- paste("http://www.scoresway.com?sport=soccer&page=person&id=",playeridtest[i], sep="")
  scoresway <- read_html(scoresway)
  urlnodescorefirst <- html_node(scoresway, "dd:nth-child(2)")
  urltextscorefirst <- html_text(urlnodescorefirst)
  playernames$first[i] <- urltextscorefirst
  urlnodescoresur <- html_node(scoresway, "dd:nth-child(4)")
  urltextscoresur <- html_text(urlnodescoresur)
  playernames$sur[i] <- urltextscoresur
}

结果:

playernames
  playernames id   first          sur
1          NA  4 Maarten Stekelenburg
2          NA 11  Robert         Huth
3          NA 14 Philipp         Lahm

playernames $ playernames列是您在顶部包含的代码的结果。只需跳过这两行,并替换为第三行:

# playernames <- rep(NA, noplayers) 
# playernames <- as.data.frame(playernames)
playernames<-NULL