社区,
快速介绍 我对Python很陌生,并尝试运行Scrapy脚本来搜索房地产网站(www.immoweb.be)。不幸的是,我在网站上遇到了Incapsula保护,这要求我先运行Selenium,这样我才能通过真正的浏览器访问该网站。
我使用SSH访问我的远程服务器,启动selenium-server(agora是带有chromedriver二进制文件的项目文件夹):
export PATH=$PATH:/home/<username>/agora
java -jar selenium-server-standalone-2.53.1.jar webdriver.chrome.driver=agora
输出:
14:53:51.308 INFO - Launching a standalone Selenium Server
14:53:51.539 INFO - Java: Oracle Corporation 24.111-b01
14:53:51.539 INFO - OS: Linux 3.16.0-4-amd64 amd64
14:53:51.602 INFO - v2.53.1, with Core v2.53.1. Built from revision a36b8b1
14:53:51.897 INFO - Driver provider org.openqa.selenium.ie.InternetExplorerDriver registration is skipped:
registration capabilities Capabilities [{platform=WINDOWS, ensureCleanSession=true, browserName=internet explorer, v
ersion=}] does not match the current platform LINUX
14:53:51.897 INFO - Driver provider org.openqa.selenium.edge.EdgeDriver registration is skipped:
registration capabilities Capabilities [{platform=WINDOWS, browserName=MicrosoftEdge, version=}] does not match the
current platform LINUX
14:53:51.898 INFO - Driver class not found: com.opera.core.systems.OperaDriver
14:53:51.898 INFO - Driver provider com.opera.core.systems.OperaDriver is not registered
14:53:51.899 INFO - Driver provider org.openqa.selenium.safari.SafariDriver registration is skipped:
registration capabilities Capabilities [{platform=MAC, browserName=safari, version=}] does not match the current pla
tform LINUX
14:53:51.918 INFO - Driver class not found: org.openqa.selenium.htmlunit.HtmlUnitDriver
14:53:51.918 INFO - Driver provider org.openqa.selenium.htmlunit.HtmlUnitDriver is not registered
14:53:52.133 INFO - RemoteWebDriver instances should connect to: http://127.0.0.1:4444/wd/hub
14:53:52.133 INFO - Selenium Server is up and running
我认为这里有什么不同寻常的东西。 然后我在virtualenv(在第二个SSH窗口中)启动Scrapy:
scrapy shell
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
driver = webdriver.Remote(
... command_executor='http://127.0.0.1:4444/wd/hub',
... desired_capabilities=DesiredCapabilities.CHROME)
引发以下内容:
2016-12-28 15:03:25 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:4444/wd/hub/session {"requiredCapabilities": {}, "desiredCapabilities
": {"version": "", "platform": "ANY", "browserName": "chrome", "javascriptEnabled": true}}
然后挂起+/- 1分钟并在我的selenium-server shell中抛出以下错误:
Caused by: org.openqa.selenium.WebDriverException: unknown error: Chrome failed to start: exited abnormally (Driver info: chromedriver=2.27.440175 (9bc1d90b8bfa4dd181fbbf769a5eb5e575574320),platform=Linux
3.16.0-4-amd64 x8 6_64) (WARNING: The server did not provide any stacktrace information) Command duration or timeout: 60.98 seconds Build info: version: '2.53.1', revision: 'a36b8b1', time: '2016-06-30 17:37:03' System info: host: 'agora', ip: '10.132.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '3.16.0-4-amd64', java .version: '1.7.0_111' Driver info: org.openqa.selenium.chrome.ChromeDriver
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.openqa.selenium.remote.ErrorHandler.createThrowable(ErrorHandler.java:206)
at org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed(ErrorHandler.java:158)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:678)
at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:249)
at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131)
at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:144)
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:170)
at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:138)
... 14 more 15:04:28.463 WARN - Exception: unknown error: Chrome failed to start: exited abnormally (Driver info: chromedriver=2.27.440175 (9bc1d90b8bfa4dd181fbbf769a5eb5e575574320),platform=Linux
3.16.0-4-amd64 x8 6_64) (WARNING: The server did not provide any stacktrace information) Command duration or timeout: 60.98 seconds Build info: version: '2.53.1', revision: 'a36b8b1', time: '2016-06-30 17:37:03' System info: host: 'agora', ip: '10.132.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '3.16.0-4-amd64', java .version: '1.7.0_111' Driver info: org.openqa.selenium.chrome.ChromeDriver
(我的scrapy shell输出类似)
我在网上发现了一些关于使用xvfb或pyvirtualdisplay的零碎信息,但似乎没有人真正解释我应该如何解决这个问题。 Stack Overflow上有人可以帮助我吗?
一些额外的背景:
我运行的服务器是Google Cloud Service(&#34; Debian GNU / Linux 8(jessie)&#34;)
Java版本是1.7.0_111
Python版本是3.4.2(通过virtualenv - 我通过pip安装scrapy&amp; selenium)