创建一个超时处理程序,用于使用Rselenium进行Web报废

时间:2015-12-18 12:01:40

标签: r web-scraping phantomjs rselenium

我正在用Rselenium和phantomjs创建一个刮板。有时我的程序查询网站需要太长时间,永远不会结束。所以我正在写一个超时处理程序。

library(RSelenium)
library(R.utils)
pJS <- phantom(pjs_cmd ="C:\\software\\phantomjs-2.0.0-windows\\bin\\phantomjs.exe"     )
UA<-'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0'
eCap <- list(phantomjs.page.settings.userAgent=UA )
remDr <- remoteDriver(browserName = "phantomjs", extraCapabilities = eCap)
remDr$open(silent=T)

time_out<-0
tryCatch({withTimeout({
        remDr$navigate("http://stackoverflow.com/questions/14399205/in-r-how-to-make-the-variables-inside-a-function-available-to-the-lower-level-f")
                                      }, envir=globalenv(),timeout=1.08);
                            }, TimeoutException=function(ex) {
            time_out<<-1
})

但我收到错误:Undefined error in RCurl call.Error in queryRD(paste0(serverURL, "/session/", sessionInfo$id, "/url"), :

无论如何,如果我试着看看remDr ......

remDr$getTitle()[[1]]
[1] "In R, how to make the variables inside a function available to the lower level function inside this function?(with, attach, environment) - Stack Overflow"

所以它奏效了!但为什么我得到错误?

1 个答案:

答案 0 :(得分:0)

请更新JAVA并检查Selenium版本是否webdriver与最新版本一起运行。这解决了问题