Question

我试图了解如何使用java内容抓取网站页面。我坚持使用硒，我认为在理解机制方面存在一些问题。我有这个脚本：

import scrapy
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

class britaudioSpider(scrapy.Spider):
    name = "britaudio"
    allowed_domains = ["http://www.example.com"]
    start_urls = ["www.example.com/archive-project"]

    def __init__(self):
        caps = DesiredCapabilities.FIREFOX
        caps["marionette"] = True
        caps["binary"] = "/usr/bin/firefox"

        driver = webdriver.Firefox(capabilities=caps)

    def parse(self, response):
        self.driver.get(response.url)
        el1 = self.driver.find_element_by_xpath('//ul[@class="level1"]/li[@class]/href')
        el1.click()
        el2 = self.driver.find_element_by_xpath('//id[@class="subNavContainer loaded"/ul[@class="level2"]/li[@class]/href')
        el2.click()
        el3 = self.driver.find_element_by_xpath('//id[@class="subNavContainer loaded"/ul[@class="level3"]/li[@class="track"]/href')
        print el3

这给了我这个错误：

  File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 62, in start
    stdout=self.log_file, stderr=self.log_file)
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
exceptions.OSError: [Errno 20] Not a directory
Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.__del__ of <selenium.webdriver.firefox.service.Service object at 0x7f050dd751d0>> ignored

我实际上并不了解web.driver是如何工作的。有人可以帮忙吗？

我已经看过这个问题：Python Selenium Exception AttributeError: "'Service' object has no attribute 'process'" in selenium.webdriver.ie.service.Service，但它似乎不适用于我的案例。

AttributeError：＆＃34;＆＃39;服务＆＃39;对象没有属性＆＃39; process＆＃39;＆＃34; whit Selenium and Scrapy

0 个答案: