我正在尝试抓取此网站:www.oddsportal.com。 这是我在R中的代码:
library(wdman)
library(RSelenium)
library(rvest)
library(data.table)
pjs <- wdman::phantomjs(port=8912L)
eCap <- list(phantomjs.page.settings.userAgent
= "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20120101 Firefox/29.0", phantomjs.page.settings.loadImages = FALSE, phantomjs.phantom.cookiesEnabled = FALSE, phantomjs.phantom.javascriptEnabled = TRUE)
remDr<-remoteDriver(port=8912L, browser="phantomjs", extraCapabilities = eCap)
remDr$open()
#login form
remDr$navigate("https://www.oddsportal.com/login")
remDr$findElement('name', 'login-submit')$clickElement()
remDr$findElement(using = 'css selector', "#login-username1")$sendKeysToElement(list("*****"))
remDr$findElement(using = 'css selector', "#login-password1")$sendKeysToElement(list("*****"))
remDr$findElement(using = 'css selector', '#col-content > div:nth-child(3) > div > form > div:nth-child(3) > button')$clickElement()
# loop through 10 000 urls and save page source to file[i]
while(i<=10000){
remDr$navigate(DT$links[i])
file[i]<-remDr$getPageSource()[[1]]
i<-i+1
}
大约100-200次循环后,它失败了,每次都给我这个错误:
Error in checkError(res) :
Undefined error in httr call. httr output: Recv failure: Connection was reset
你能帮我吗?是什么导致此错误?谢谢。