httr调用中发生未定义的错误。 httr输出:接收失败:连接已重置

时间:2019-04-17 16:02:57

标签: r web-scraping phantomjs rvest rselenium

我正在尝试抓取此网站:www.oddsportal.com。 这是我在R中的代码:

library(wdman)
library(RSelenium)
library(rvest)
library(data.table)


pjs <- wdman::phantomjs(port=8912L)

eCap <- list(phantomjs.page.settings.userAgent 
             = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20120101 Firefox/29.0", phantomjs.page.settings.loadImages = FALSE, phantomjs.phantom.cookiesEnabled = FALSE, phantomjs.phantom.javascriptEnabled = TRUE)


remDr<-remoteDriver(port=8912L, browser="phantomjs", extraCapabilities = eCap)

remDr$open()

#login form
remDr$navigate("https://www.oddsportal.com/login")
remDr$findElement('name', 'login-submit')$clickElement()
remDr$findElement(using = 'css selector', "#login-username1")$sendKeysToElement(list("*****"))
remDr$findElement(using = 'css selector', "#login-password1")$sendKeysToElement(list("*****"))
remDr$findElement(using = 'css selector', '#col-content > div:nth-child(3) > div > form > div:nth-child(3) > button')$clickElement()

# loop through 10 000 urls and save page source to file[i]
while(i<=10000){
  remDr$navigate(DT$links[i])
  file[i]<-remDr$getPageSource()[[1]]
  i<-i+1
}

大约100-200次循环后,它失败了,每次都给我这个错误:

Error in checkError(res) : 
  Undefined error in httr call. httr output: Recv failure: Connection was reset

你能帮我吗?是什么导致此错误?谢谢。

0 个答案:

没有答案