python exceptions.AttributeError:'module'对象没有属性'from_settingI

时间:2014-01-18 15:33:14

标签: python python-2.7 scrapy

我正在使用scrapy 0.2和python2.7

我想知道我现在正在抓取的链接是否已被删除。

我搜索了很多,我找到了这个例子how to filter duplicate requests based on url in scrapy

我复制了代码并将其放在我的spider文件夹中并更改了设置,但我得到了这个例外:

Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 1237, in unwindGenerator
    return _inlineCallbacks(None, gen, Deferred())
  File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 1099, in _inlineCallbacks
    result = g.send(result)
  File "C:\Python27\lib\site-packages\scrapy-0.20.2-py2.7.egg\scrapy\crawler.py", line 66, in start
    yield self.engine.open_spider(self._spider, self._start_requests())
  File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 1237, in unwindGenerator
    return _inlineCallbacks(None, gen, Deferred())
--- <exception caught here> ---
  File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 1099, in _inlineCallbacks
    result = g.send(result)
  File "C:\Python27\lib\site-packages\scrapy-0.20.2-py2.7.egg\scrapy\core\engine.py", line 221, in open_spider
    scheduler = self.scheduler_cls.from_crawler(self.crawler)
  File "C:\Python27\lib\site-packages\scrapy-0.20.2-py2.7.egg\scrapy\core\scheduler.py", line 25, in from_crawler
    dupefilter = dupefilter_cls.from_settings(settings)
exceptions.AttributeError: 'module' object has no attribute 'from_settings'

我的代码:

import os

from scrapy.dupefilter import RFPDupeFilter
from scrapy.utils.request import request_fingerprint

class CustomFilter(RFPDupeFilter):
    def __getid(self, url):
        mm = url.split("&refer")[0] #or something like that
        return mm

    def request_seen(self, request):
        print "SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS"
        fp = self.__getid(request.url)
        if fp in self.fingerprints:
            return True
        self.fingerprints.add(fp)
        if self.file:
            self.file.write(fp + os.linesep)

在设置中我添加了这个:

DUPEFILTER_CLASS = 'myproject.spiders.CustomFilter'

0 个答案:

没有答案