如何在继承Scrapy的RetryMiddleware类时修复循环导入?

时间:2017-05-15 10:33:52

标签: python scrapy

我正在尝试修改Scrapy的RetryMiddleware类,使用复制粘贴版本覆盖_retry方法,其中我只添加一行。我尝试按如下方式启动自定义中间件模块:

import scrapy.downloadermiddlewares.retry
from scrapy.utils.python import global_object_name

但是,这会产生一个

  

ImportError:无法导入名称global_object_name

根据ImportError: Cannot import name X,这种类型的错误是由循环导入引起的,但在这种情况下,我无法轻松删除Scrapy源代码中的依赖项。我该如何解决这个问题?

为了完整起见,这里是我试图实施的TorRetryMiddleware

import logging
import scrapy.downloadermiddlewares.retry
from scrapy.utils.python import global_object_name
import apkmirror_scraper.tor_controller as tor_controller

logger = logging.getLogger(__name__)

class TorRetryMiddleware(scrapy.downloadermiddlewares.retry.RetryMiddleware):
    def __init__(self, settings):
        super(TorRetryMiddleware, self).__init__(settings)
        self.retry_http_codes = {403, 429}                  # Retry on 403 ('Forbidden') and 429 ('Too Many Requests')

    def _retry(self, request, reason, spider):
        '''Same as original '_retry' method, but with a call to 'change_identity' before returning the Request.'''
        retries = request.meta.get('retry_times', 0) + 1

        stats = spider.crawler.stats
        if retries <= self.max_retry_times:
            logger.debug("Retrying %(request)s (failed %(retries)d times): %(reason)s",
                         {'request': request, 'retries': retries, 'reason': reason},
                         extra={'spider': spider})
            retryreq = request.copy()
            retryreq.meta['retry_times'] = retries
            retryreq.dont_filter = True
            retryreq.priority = request.priority + self.priority_adjust

            if isinstance(reason, Exception):
                reason = global_object_name(reason.__class__)

            stats.inc_value('retry/count')
            stats.inc_value('retry/reason_count/%s' % reason)

            tor_controller.change_identity()    # This line is added to the original '_retry' method      

            return retryreq
        else:
            stats.inc_value('retry/max_reached')
            logger.debug("Gave up retrying %(request)s (failed %(retries)d times): %(reason)s",
                         {'request': request, 'retries': retries, 'reason': reason},
                         extra={'spider': spider})

1 个答案:

答案 0 :(得分:4)

我个人认为ImportError来自循环导入。相反,您的Scrapy版本很可能还不包含scrapy.utils.python.global_object_name

scrapy.utils.python.global_object_name直到this commit才出现,它还不属于任何现有版本(最新版本是v1.3.3)(但它的目标是版本v1.4)。 / p>

请使用GitHub中的代码验证您 ,并且您的代码确实包含了该提交。

<强>编辑:

关于:

  

根据ImportError:无法导入名称X,这种类型的错误是由循环导入引起的,

导致ImportError的原因有很多。通常,堆栈跟踪足以确定根本原因。 E.g。

>>> import no_such_name
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named no_such_name

虽然循环导入应具有完全不同的堆栈跟踪,例如

[pengyu@GLaDOS-Precision-7510 tmp]$ cat foo.py 
from bar import baz
baz = 1
[pengyu@GLaDOS-Precision-7510 tmp]$ cat bar.py 
from foo import baz
baz = 2
[pengyu@GLaDOS-Precision-7510 tmp]$ python -c "import foo"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/foo.py", line 1, in <module>
    from bar import baz
  File "/tmp/bar.py", line 1, in <module>
    from foo import baz
ImportError: cannot import name 'baz'