我试图了解如何使用java内容抓取网站页面。我坚持使用硒,我认为在理解机制方面存在一些问题。 我有这个脚本:
import scrapy
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
class britaudioSpider(scrapy.Spider):
name = "britaudio"
allowed_domains = ["http://www.example.com"]
start_urls = ["www.example.com/archive-project"]
def __init__(self):
caps = DesiredCapabilities.FIREFOX
caps["marionette"] = True
caps["binary"] = "/usr/bin/firefox"
driver = webdriver.Firefox(capabilities=caps)
def parse(self, response):
self.driver.get(response.url)
el1 = self.driver.find_element_by_xpath('//ul[@class="level1"]/li[@class]/href')
el1.click()
el2 = self.driver.find_element_by_xpath('//id[@class="subNavContainer loaded"/ul[@class="level2"]/li[@class]/href')
el2.click()
el3 = self.driver.find_element_by_xpath('//id[@class="subNavContainer loaded"/ul[@class="level3"]/li[@class="track"]/href')
print el3
这给了我这个错误:
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/common/service.py", line 62, in start
stdout=self.log_file, stderr=self.log_file)
File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
exceptions.OSError: [Errno 20] Not a directory
Exception AttributeError: "'Service' object has no attribute 'process'" in <bound method Service.__del__ of <selenium.webdriver.firefox.service.Service object at 0x7f050dd751d0>> ignored
我实际上并不了解web.driver是如何工作的。有人可以帮忙吗?
我已经看过这个问题:Python Selenium Exception AttributeError: "'Service' object has no attribute 'process'" in selenium.webdriver.ie.service.Service,但它似乎不适用于我的案例。