Question

我试图在scrapy文档之后从python脚本运行蜘蛛：http://doc.scrapy.org/en/latest/topics/practices.html

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from testspiders.spiders.followall import FollowAllSpider
from scrapy.utils.project import get_project_settings

spider = FollowAllSpider(domain='scrapinghub.com')
settings = get_project_settings()
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run() # the script will block here until the spider_closed signal was sent

但是python只是无法导入模块，错误如下所示：

Traceback (most recent call last):
...
    from scrapy.crawler import Crawler
  File "aappp/scrapy.py", line 1, in <module>
ImportError: No module named crawler

该问题在scrapy文档的常见问题中简要提及，但对我来说并没有太大的帮助。

Answer 1

你尝试过这样做吗？

from scrapy.project import crawler

（那是http://doc.scrapy.org/en/latest/faq.html的表现 - 看起来他们已经在那里回答了你的问题。）

它还提供了一种更新的方法，并且不推荐使用以前的方法：

＆＃34;不推荐使用这种访问crawler对象的方法，应该将代码移植到使用from_crawler类方法，例如：

class SomeExtension（object）：

@classmethod
def from_crawler(cls, crawler):
    o = cls()
    o.crawler = crawler
    return o

＆＃34;

无法将scrapy模块导入为库

1 个答案: