Scrapy如何通过其名称找到Spider类?

时间:2014-04-02 13:59:11

标签: python scrapy

说我有这只蜘蛛:

class SomeSPider(Spider):
     name ='spname'

然后我可以抓住我的蜘蛛,通过创建SomeSpider的新实例并像这样调用爬虫:

spider= SomeSpider()
crawler = Crawler(settings)
crawler.configure()
crawler.crawl(spider)
....

我可以使用蜘蛛名称做同样的事情吗?我是说' spname' ?

crawler.crawl('spname') ## I give just the spider name here

如何动态创建蜘蛛? 我猜scrapy经理在内部做,因为这很好用:

Scrapy crawl spname   

一个解决方案是解析我的蜘蛛文件夹,获取所有Spiders类并使用name属性过滤它们?但这看起来像一个牵强附会的解决方案!

提前感谢您的帮助。

2 个答案:

答案 0 :(得分:3)

请查看源代码:

# scrapy/commands/crawl.py

class Command(ScrapyCommand):

    def run(self, args, opts):
        ...

# scrapy/spidermanager.py

class SpiderManager(object):

    def _load_spiders(self, module):
        ...

    def create(self, spider_name, **spider_kwargs):
        ...

# scrapy/utils/spider.py

def iter_spider_classes(module):
    """Return an iterator over all spider classes defined in the given module
    that can be instantiated (ie. which have name)
    """
    ...

答案 1 :(得分:1)

受@kev回答的启发,这里有一个检查蜘蛛类的函数:

from scrapy.utils.misc import walk_modules
from scrapy.utils.spider import iter_spider_classes

def _load_spiders(module='spiders.SomeSpider'):
    for module in walk_modules(module):
        for spcls in iter_spider_classes(module):
            self._spiders[spcls.name] = spcls

然后你可以实例化:

somespider = self._spiders['spname']()