与Scrapy一起运行selenium-server - 无法运行Chrome Remote

时间:2016-12-28 15:14:37

标签: python linux selenium ssh scrapy

社区,

快速介绍 我对Python很陌生,并尝试运行Scrapy脚本来搜索房地产网站(www.immoweb.be)。不幸的是,我在网站上遇到了Incapsula保护,这要求我先运行Selenium,这样我才能通过真正的浏览器访问该网站。

我使用SSH访问我的远程服务器,启动selenium-server(agora是带有chromedriver二进制文件的项目文件夹):

export PATH=$PATH:/home/<username>/agora
java -jar selenium-server-standalone-2.53.1.jar webdriver.chrome.driver=agora

输出:

14:53:51.308 INFO - Launching a standalone Selenium Server
14:53:51.539 INFO - Java: Oracle Corporation 24.111-b01
14:53:51.539 INFO - OS: Linux 3.16.0-4-amd64 amd64
14:53:51.602 INFO - v2.53.1, with Core v2.53.1. Built from revision a36b8b1
14:53:51.897 INFO - Driver provider     org.openqa.selenium.ie.InternetExplorerDriver registration is skipped:
registration capabilities Capabilities [{platform=WINDOWS,   ensureCleanSession=true, browserName=internet explorer, v
ersion=}] does not match the current platform LINUX
14:53:51.897 INFO - Driver provider org.openqa.selenium.edge.EdgeDriver     registration is skipped:
registration capabilities Capabilities [{platform=WINDOWS,   browserName=MicrosoftEdge, version=}] does not match the 
current platform LINUX
14:53:51.898 INFO - Driver class not found:  com.opera.core.systems.OperaDriver
14:53:51.898 INFO - Driver provider com.opera.core.systems.OperaDriver is not registered
14:53:51.899 INFO - Driver provider org.openqa.selenium.safari.SafariDriver registration is skipped:
registration capabilities Capabilities [{platform=MAC, browserName=safari,   version=}] does not match the current pla
tform LINUX
14:53:51.918 INFO - Driver class not found:  org.openqa.selenium.htmlunit.HtmlUnitDriver
14:53:51.918 INFO - Driver provider org.openqa.selenium.htmlunit.HtmlUnitDriver is not registered
14:53:52.133 INFO - RemoteWebDriver instances should connect to:  http://127.0.0.1:4444/wd/hub
14:53:52.133 INFO - Selenium Server is up and running

我认为这里有什么不同寻常的东西。 然后我在virtualenv(在第二个SSH窗口中)启动Scrapy:

scrapy shell
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
driver = webdriver.Remote(
...    command_executor='http://127.0.0.1:4444/wd/hub',
...    desired_capabilities=DesiredCapabilities.CHROME)

引发以下内容:

2016-12-28 15:03:25 [selenium.webdriver.remote.remote_connection] DEBUG:  POST http://127.0.0.1:4444/wd/hub/session {"requiredCapabilities": {},    "desiredCapabilities
": {"version": "", "platform": "ANY", "browserName": "chrome",  "javascriptEnabled": true}}

然后挂起+/- 1分钟并在我的selenium-server shell中抛出以下错误:

Caused by: org.openqa.selenium.WebDriverException: unknown error: Chrome failed to start: exited abnormally   (Driver info: chromedriver=2.27.440175 (9bc1d90b8bfa4dd181fbbf769a5eb5e575574320),platform=Linux
3.16.0-4-amd64 x8 6_64) (WARNING: The server did not provide any stacktrace information) Command duration or timeout: 60.98 seconds Build info: version: '2.53.1', revision: 'a36b8b1', time: '2016-06-30 17:37:03' System info: host: 'agora', ip: '10.132.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '3.16.0-4-amd64', java .version: '1.7.0_111' Driver info: org.openqa.selenium.chrome.ChromeDriver
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.openqa.selenium.remote.ErrorHandler.createThrowable(ErrorHandler.java:206)
        at org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed(ErrorHandler.java:158)
        at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:678)
        at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:249)
        at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131)
        at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:144)
        at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:170)
        at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:138)
        ... 14 more 15:04:28.463 WARN - Exception: unknown error: Chrome failed to start: exited abnormally   (Driver info: chromedriver=2.27.440175 (9bc1d90b8bfa4dd181fbbf769a5eb5e575574320),platform=Linux
3.16.0-4-amd64 x8 6_64) (WARNING: The server did not provide any stacktrace information) Command duration or timeout: 60.98 seconds Build info: version: '2.53.1', revision: 'a36b8b1', time: '2016-06-30 17:37:03' System info: host: 'agora', ip: '10.132.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '3.16.0-4-amd64', java .version: '1.7.0_111' Driver info: org.openqa.selenium.chrome.ChromeDriver

(我的scrapy shell输出类似)

我在网上发现了一些关于使用xvfb或pyvirtualdisplay的零碎信息,但似乎没有人真正解释我应该如何解决这个问题。 Stack Overflow上有人可以帮助我吗?

一些额外的背景:

我运行的服务器是Google Cloud Service(&#34; Debian GNU / Linux 8(jessie)&#34;)

Java版本是1.7.0_111

Python版本是3.4.2(通过virtualenv - 我通过pip安装scrapy&amp; selenium)

0 个答案:

没有答案