您好在此网页http://www.uefa.com/uefachampionsleague/season=2016/statistics/round=2000634/players/index.html
上我正在尝试使用RSelenium点击链接中的所有玩家名称,抓住个别玩家的网页返回并继续其他玩家
# packages
library(RSelenium)
library(XML)
# navigation to the site
remDr <- remoteDriver$new()
remDr$open()
remDr$navigate("http://www.uefa.com/uefachampionsleague/season=2016/statistics/round=2000634/players/index.html")
# this will find all needed links
player<-remDr$findElements(using = 'xpath',value = "//span/a")
# this confirms that there are 20 links
length(player)
# this is loop which is supposed to click go to all 20 pages scrape some info and proceed
for (i in 1:20) {
player<-remDr$findElements(using = 'xpath',value = "//span/a")
player[[i]]$clickElement()
Sys.sleep(5)
urlplayer<-remDr$getCurrentUrl()
urlplayer2<-htmlParse(urlplayer)
hraci<-xpathSApply(urlplayer2,path = "//ul[@class='innerText']/li",fun = xmlValue)
print(hraci)
remDr$goBack()
}
我运行此代码几次,但总是在一些迭代后得到错误Error in player[[i]] : subscript out of bounds
。
如果我在上次尝试中查找迭代器的值,则为7,有时为12和其他数字。
我不知道为什么我会收到此错误,因此感谢任何人的帮助!
答案 0 :(得分:0)
我建议采用不同的方法,不需要Selenium:
library(XML)
doc <- htmlParse("http://www.uefa.com/statistics/uefachampionsleague/season=2016/statistics/round=2000634/players/_loadRemaining.html", encoding = "UTF-8")
n <- 3
hrefs <- head( xpathSApply(doc, "//tr/td[1]/span/a", xmlGetAttr, "href"), n )
players <- head( xpathSApply(doc, "//tr/td[1]/span/a", xmlValue), n )
for (x in seq(hrefs))
download.file(paste0("http://www.uefa.com", hrefs[x]), file.path(tempdir(), paste0(players[x], ".html")) )
x <- 1
readHTMLTable(file.path(tempdir(), paste0(players[x], ".html")))