我正在尝试一个易碎的包装https://github.com/clemfromspace/scrapy-selenium。
我已经按照上面github主页上的说明进行操作。我开始了一个新的scrapy项目并创建了一个蜘蛛:
from scrapy_selenium import SeleniumRequest
from shutil import which
SELENIUM_DRIVER_NAME = 'firefox'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('geckodriver')
SELENIUM_DRIVER_ARGUMENTS=['-headless'] # '--headless' if using chrome instead of firefox
class MySpider(scrapy.Spider):
start_urls = ["http://yahoo.com"]
name = 'test'
def start_requests(self):
for url in self.start_urls:
yield SeleniumRequest(url, self.parse_index_page)
def parse_index_page(self, response):
....
我已经下载了最新的geckodriver并设置了上面的路径
输出包含:
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2019-07-05 14:14:44 [scrapy.middleware] WARNING: Disabled SeleniumMiddleware: SELENIUM_DRIVER_NAME and SELENIUM_DRIVER_EXECUTABLE_PATH must be set
2019-07-05 14:14:44 [scrapy.middleware] INFO: Enabled downloader middlewares:
2019-07-05 14:56:59 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', “ scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware”, 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', “ scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware”, 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats']
我没有看到硒下载器,我看到了
WARNING: Disabled SeleniumMiddleware: SELENIUM_DRIVER_NAME and SELENIUM_DRIVER_EXECUTABLE_PATH must be set.
我在做什么错了?
编辑:
我结束了投稿:
# -*- coding: utf-8 -*-
import os
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
SELENIUM_DRIVER_NAME = 'firefox'
SELENIUM_DRIVER_EXECUTABLE_PATH = 'E:/ENVS/r3/scrapySelenium/geckodriver.exe'
SELENIUM_DRIVER_ARGUMENTS=[] # '--headless' if using chrome instead of firefox'
os.environ["PATH"] += os.pathsep + SELENIUM_DRIVER_EXECUTABLE_PATH
os.environ["PATH"] += os.pathsep + '..../AppData/Local/Mozilla Firefox'
firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
firefox_capabilities['binary'] = '..../AppData/Local/Mozilla Firefox/firefox.exe'
driver = webdriver.Firefox(capabilities=firefox_capabilities)
settings.py中,出现了一系列错误消息,最终使它正常工作