我正在尝试修改Scrapy的RetryMiddleware类,使用复制粘贴版本覆盖_retry
方法,其中我只添加一行。我尝试按如下方式启动自定义中间件模块:
import scrapy.downloadermiddlewares.retry
from scrapy.utils.python import global_object_name
但是,这会产生一个
ImportError:无法导入名称global_object_name
根据ImportError: Cannot import name X,这种类型的错误是由循环导入引起的,但在这种情况下,我无法轻松删除Scrapy源代码中的依赖项。我该如何解决这个问题?
为了完整起见,这里是我试图实施的TorRetryMiddleware
:
import logging
import scrapy.downloadermiddlewares.retry
from scrapy.utils.python import global_object_name
import apkmirror_scraper.tor_controller as tor_controller
logger = logging.getLogger(__name__)
class TorRetryMiddleware(scrapy.downloadermiddlewares.retry.RetryMiddleware):
def __init__(self, settings):
super(TorRetryMiddleware, self).__init__(settings)
self.retry_http_codes = {403, 429} # Retry on 403 ('Forbidden') and 429 ('Too Many Requests')
def _retry(self, request, reason, spider):
'''Same as original '_retry' method, but with a call to 'change_identity' before returning the Request.'''
retries = request.meta.get('retry_times', 0) + 1
stats = spider.crawler.stats
if retries <= self.max_retry_times:
logger.debug("Retrying %(request)s (failed %(retries)d times): %(reason)s",
{'request': request, 'retries': retries, 'reason': reason},
extra={'spider': spider})
retryreq = request.copy()
retryreq.meta['retry_times'] = retries
retryreq.dont_filter = True
retryreq.priority = request.priority + self.priority_adjust
if isinstance(reason, Exception):
reason = global_object_name(reason.__class__)
stats.inc_value('retry/count')
stats.inc_value('retry/reason_count/%s' % reason)
tor_controller.change_identity() # This line is added to the original '_retry' method
return retryreq
else:
stats.inc_value('retry/max_reached')
logger.debug("Gave up retrying %(request)s (failed %(retries)d times): %(reason)s",
{'request': request, 'retries': retries, 'reason': reason},
extra={'spider': spider})
答案 0 :(得分:4)
我个人认为ImportError
来自循环导入。相反,您的Scrapy版本很可能还不包含scrapy.utils.python.global_object_name
。
scrapy.utils.python.global_object_name
直到this commit才出现,它还不属于任何现有版本(最新版本是v1.3.3)(但它的目标是版本v1.4)。 / p>
请使用GitHub中的代码验证您 ,并且您的代码确实包含了该提交。
<强>编辑:强>
关于:
根据ImportError:无法导入名称X,这种类型的错误是由循环导入引起的,
导致ImportError
的原因有很多。通常,堆栈跟踪足以确定根本原因。 E.g。
>>> import no_such_name
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named no_such_name
虽然循环导入应具有完全不同的堆栈跟踪,例如
[pengyu@GLaDOS-Precision-7510 tmp]$ cat foo.py
from bar import baz
baz = 1
[pengyu@GLaDOS-Precision-7510 tmp]$ cat bar.py
from foo import baz
baz = 2
[pengyu@GLaDOS-Precision-7510 tmp]$ python -c "import foo"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/foo.py", line 1, in <module>
from bar import baz
File "/tmp/bar.py", line 1, in <module>
from foo import baz
ImportError: cannot import name 'baz'