尝试使用scrapyd使用selenium和webdriver进行抓取蜘蛛使用命令“scrapy crawl myspider”正常工作但是当我使用scrapyd进行部署并最终使用curl和scrapyd api进行调度时,它会触发意外的关键字参数'_job'< / p>
这是我的蜘蛛代码
#!G:\python-2-7
import scrapy
from scrapy.spider import BaseSpider
from selenium import webdriver
from scrapy.http import TextResponse
import time
from time import sleep
import pickle
import math
from math import floor
from thevillages.items import ThevillagesItem
import MySQLdb
import sys
import json
class VillageSpider(BaseSpider):
name = 'village'
allowed_domains = ["example.com"]
start_urls = ['https://www.example.com/']
def __init__(self, *args, **kwargs):
super(VillageSpider, self).__init__(*args, **kwargs)
self.driver = webdriver.Firefox()
# def __init__(self):
def parse(self, response):
self.driver.get(response.url)
查看下面的一条或错误日志
2017-10-17 17:58:05 [twisted] CRITICAL: Unhandled error in Deferred:
2017-10-17 17:58:05 [twisted] CRITICAL:
Traceback (most recent call last):
File "g:\python-2-7\lib\site-packages\twisted\internet\defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "g:\python-2-7\lib\site-packages\scrapy\crawler.py", line 95, in crawl
six.reraise(*exc_info)
File "g:\python-2-7\lib\site-packages\scrapy\crawler.py", line 76, in crawl
self.spider = self._create_spider(*args, **kwargs)
File "g:\python-2-7\lib\site-packages\scrapy\crawler.py", line 99, in _create_spider
return self.spidercls.from_crawler(self, *args, **kwargs)
File "g:\python-2-7\lib\site-packages\scrapy\spiders\__init__.py", line 54, in from_crawler
spider = cls(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument '_job'
答案 0 :(得分:2)
您需要将代码更改为
class VillageSpider(BaseSpider):
name = 'village'
allowed_domains = ["example.com"]
start_urls = ['https://www.example.com/']
def __init__(self, name=None, **kwargs):
kwargs.pop('_job')
super(VillageSpider, self).__init__(name, **kwargs)
self.driver = webdriver.Firefox()
# def __init__(self):
def parse(self, response):
self.driver.get(response.url)
因此init的定义与基类相同