Question

爬虫程序通过命令行正常工作会出现此错误：

2016-03-30 03:47:59 [scrapy] INFO: Scrapy 1.0.5 started (bot: scrapybot)
2016-03-30 03:47:59 [scrapy] INFO: Optional features available: ssl, http11
2016-03-30 03:47:59 [scrapy] INFO: Overridden settings: {'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'}

Traceback (most recent call last):
  File "/home/ahmeds/scrapProject/crawler/startcrawls.py", line 11, in <module>
    process.crawl(onioncrawl)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 150, in crawl
    crawler = self._create_crawler(crawler_or_spidercls)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 166, in _create_crawler
    return Crawler(spidercls, self.settings)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 32, in __init__
    self.spidercls.update_settings(self.settings)
AttributeError: 'module' object has no attribute 'update_settings'

这是我按照latest documentation按脚本运行抓取工具的代码。我的scrapy版本是1.0.5。

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from crawler.spiders import onioncrawl

setting = get_project_settings()
process = CrawlerProcess(setting)
process.crawl(onioncrawl)
process.start()

Answer 1

我使用Spider文件名而不是Spider类名。

Answer 2

你可以尝试

process.crawl(onioncrawl.<ClassName>).

将您的ClassName替换为onioncrawl模块中的真实类名

Answer 3

我在Python 2.7.x和Scrapy 1.7.x中遇到了相同的问题

下面的代码修复了我的问题

process.crawl(onioncrawl.ClassName)

Answer 4

在使用py3时，如果项目结构是这样

crawler
----spiders
--------onioncrawl.py(class onioncrawl defined in this file)

尝试from crawler.spiders.onioncrawl import onioncrawl和process.crawl(onioncrawl)。

AttributeError：'module'对象没有属性'update_settings'scrapy 1.0.5

4 个答案: