所以我在这个问题上已经连续数周撞墙了。我尝试了多种解决方案,但我无法获得优雅的工作效果。理想情况下,我需要在打开蜘蛛时检查文件,以及文件是否停止执行。我可以在解析方法中执行此操作,但这很丑陋且难以维护。我想我可能会写一些中间件来做这个但是现在我只想在我的每个蜘蛛中实现它。以下是我到目前为止的情况:
class MySpider(Spider):
def __init__(self):
dispatcher.connect(self.spider_opened, signals.spider_opened)
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super(MySpider, cls).from_crawler(crawler, *args, **kwargs)
crawler.signals.connect(spider.spider_opened, signal=signals.spider_opened)
return spider
def spider_opened(self):
raise CloseSpider("Testing force close")
这不起作用。我得到以下异常:
2018-06-15 13:05:46 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method ?.spider_opened of <MySpider 'myspider' at 0x10c450050>>
Traceback (most recent call last):
File "/Users/.../Library/Python/2.7/lib/python/site-packages/twisted/internet/defer.py", line 150, in maybeDeferred
result = f(*args, **kw)
File "build/bdist.macosx-10.11-intel/egg/pydispatch/robustapply.py", line 55, in robustApply
File "/Users/.../myspider.py", line 72, in spider_opened
raise CloseSpider("Testing force close")
CloseSpider
在我的IDE中,pylint说:
E1101:Instance of 'Spider' has no 'spider_opened' member
有人能指出我的解决方案吗?是因为我正在运行Scrapy v1.3.0吗?
答案 0 :(得分:0)
应该是这样的:
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
class MySpider(Spider):
def __init__(self):
dispatcher.connect(self.spider_opened, signals.spider_opened)
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super(MySpider, cls).from_crawler(crawler, *args, **kwargs)
return spider
def spider_opened(self):
raise CloseSpider("Testing force close")
请注意,您无法声明crawler.signals.connect(spider.spider_opened, signal=signals.spider_opened)
在classmethod
内,因为spider_opened
应该在蜘蛛实例上被调用。