在使用RSelenium时,for循环不会遍历所有迭代

时间:2016-04-03 19:13:58

标签: r for-loop rselenium

您好在此网页http://www.uefa.com/uefachampionsleague/season=2016/statistics/round=2000634/players/index.html

我正在尝试使用RSelenium点击链接中的所有玩家名称,抓住个别玩家的网页返回并继续其他玩家

# packages
library(RSelenium)
library(XML)


 # navigation to the site
    remDr <- remoteDriver$new()
    remDr$open()
    remDr$navigate("http://www.uefa.com/uefachampionsleague/season=2016/statistics/round=2000634/players/index.html")

 # this will find all needed links
    player<-remDr$findElements(using = 'xpath',value = "//span/a")

 # this confirms that there are 20 links
    length(player)


# this is loop which is supposed to click go to all 20 pages scrape some info and proceed
for (i in 1:20) {

    player<-remDr$findElements(using = 'xpath',value = "//span/a")
    player[[i]]$clickElement()
    Sys.sleep(5)
    urlplayer<-remDr$getCurrentUrl()
    urlplayer2<-htmlParse(urlplayer)
    hraci<-xpathSApply(urlplayer2,path = "//ul[@class='innerText']/li",fun = xmlValue)
    print(hraci)
    remDr$goBack()
}

我运行此代码几次,但总是在一些迭代后得到错误Error in player[[i]] : subscript out of bounds

如果我在上次尝试中查找迭代器的值,则为7,有时为12和其他数字。

我不知道为什么我会收到此错误,因此感谢任何人的帮助!

1 个答案:

答案 0 :(得分:0)

我建议采用不同的方法,不需要Selenium:

library(XML)
doc <- htmlParse("http://www.uefa.com/statistics/uefachampionsleague/season=2016/statistics/round=2000634/players/_loadRemaining.html", encoding = "UTF-8")
n <- 3
hrefs <- head( xpathSApply(doc, "//tr/td[1]/span/a", xmlGetAttr, "href"), n )
players <- head( xpathSApply(doc, "//tr/td[1]/span/a", xmlValue), n )
for (x in seq(hrefs)) 
  download.file(paste0("http://www.uefa.com", hrefs[x]), file.path(tempdir(), paste0(players[x], ".html")) )

x <- 1
readHTMLTable(file.path(tempdir(), paste0(players[x], ".html")))