r - 学习RSelenium,一些基本的初学者技术问题

时间:2016-11-01 22:38:11

标签: r web-scraping

我查看了https://github.com/ropensci/RSelenium/issues/94https://github.com/ropensci/RSelenium/issues/82,但未能解决我的问题。这个人在Windows上没有帮助,我在Mac上(El Capitan,版本10.11.6)

我正在尝试使用RSelenium学习数据抓取,但是它的一些技术方面很早就给我提出了问题。我先问几个问题然后分享我的代码:

(1)它立即说不推荐使用startServer()。特别是:

startServer()

# output
Warning message:
startServer is deprecated.
Users in future can find the function in 
file.path(find.package("RSelenium"), "example/serverUtils").
The sourcing/starting of a Selenium Server is a users responsiblity. 
Options include manually starting a server see 
vignette("RSelenium-basics", package = "RSelenium")
and running a docker container see  
vignette("RSelenium-docker", package = "RSelenium")

。 我应该用什么代替startSever(),或者我需要在计算机上更改什么?我对这个变暖的消息说的话感到困惑。

(2)由于它只是一个警告,我继续尝试以chrome打开浏览器。我很快就遇到了另一个错误:

remDr = remoteDriver$new(browserName = 'chrome')
remDr$open()

# output 
[1] "Connecting to remote server"
$webdriver.remote.sessionid
[1] "4d0ad1d9-1c4b-4171-8dce-ba8363f5849e"

$locationContextEnabled
[1] TRUE

$webStorageEnabled
[1] TRUE

$takesScreenshot
[1] TRUE

$javascriptEnabled
[1] TRUE

$message
[1] "session not created exception\nfrom unknown error: Runtime.executionContextCreated has invalid 'context': {\"auxData\":{\"frameId\":\"34144.1\",\"isDefault\":true},\"id\":1,\"name\":\"\",\"origin\":\"://\"}\n  (Session info: chrome=54.0.2840.71)\n  (Driver info: chromedriver=2.20.353124 (035346203162d32c80f1dce587c8154a1efa0c3b),platform=Mac OS X 10.11.6 x86_64)"

$hasTouchScreen
[1] TRUE

$platform
[1] "ANY"

$cssSelectorsEnabled
[1] TRUE

$id
[1] "4d0ad1d9-1c4b-4171-8dce-ba8363f5849e"

$ message行输出提到没有创建会话。在我的桌面上,我看到的是,Chrome最初会暂时打开,然后关闭/崩溃/实际上没有打开。我再次尝试使用firefox,并获得:

remDr = remoteDriver$new(browserName = 'firefox')
remDr$open()

# output 
[1] "Connecting to remote server"

Selenium message:The path to the driver executable must be set by the webdriver.gecko.driver system property; for more information, see https://github.com/mozilla/geckodriver. The latest version can be downloaded from https://github.com/mozilla/geckodriver/releases

Error:   Summary: UnknownError
     Detail: An unknown server-side error occurred while processing the command.
     class: java.lang.IllegalStateException
     Further Details: run errorDetails method
尝试学习这一点是令人沮丧的,但是甚至无法通过打开浏览器的第一步。非常感谢任何帮助!

1 个答案:

答案 0 :(得分:1)

如上所述checkForServerstartServer已弃用,您可以按以下方式使用它们:

unlink(file.path(find.package("RSelenium"), "bin"), recursive = TRUE, force = TRUE)
RSelenium::checkForServer()

对于Firefox:

在终端中,运行以下命令

brew install geckodriver

在Mac上的默认端口运行selenium存在问题,因为Kerberos已经在MAC上的默认端口4444上运行。在R控制台中运行以下命令

selServ <- RSelenium::startServer(args = c("-port 5556"))
remDr <- RSelenium::remoteDriver(extraCapabilities = list(marionette = TRUE), port=5556)
remDr$open()
......
# when finished
selServ$stop()

对于chrome:

brew install chromedriver

在Mac上的默认端口运行selenium存在问题。在R控制台中运行以下命令

selServ <- RSelenium::startServer(args = c("-port 5556"))
remDr <- RSelenium::remoteDriver(browserName = "chrome", 
                                 extraCapabilities = list(marionette = TRUE),
                                 port=5556)
remDr$open()
......
# when finished
selServ$stop()

如果上面没有帮助,那么看看运行Docker容器看看 http://rpubs.com/johndharrison/RSelenium-Dockerhttps://github.com/SeleniumHQ/docker-selenium。这基本上涉及使用以下内容运行Docker容器:

$ docker run -d -p 5556:4444 selenium/standalone-chrome:3.0.1-aluminum

然后可以在端口5556上访问selenium服务器和chrome浏览器,您可以连接到remoteDriver中提供适当的参数。