说我有这只蜘蛛:
class SomeSPider(Spider):
name ='spname'
然后我可以抓住我的蜘蛛,通过创建SomeSpider的新实例并像这样调用爬虫:
spider= SomeSpider()
crawler = Crawler(settings)
crawler.configure()
crawler.crawl(spider)
....
我可以使用蜘蛛名称做同样的事情吗?我是说' spname' ?
crawler.crawl('spname') ## I give just the spider name here
如何动态创建蜘蛛? 我猜scrapy经理在内部做,因为这很好用:
Scrapy crawl spname
一个解决方案是解析我的蜘蛛文件夹,获取所有Spiders类并使用name属性过滤它们?但这看起来像一个牵强附会的解决方案!
提前感谢您的帮助。
答案 0 :(得分:3)
请查看源代码:
# scrapy/commands/crawl.py
class Command(ScrapyCommand):
def run(self, args, opts):
...
# scrapy/spidermanager.py
class SpiderManager(object):
def _load_spiders(self, module):
...
def create(self, spider_name, **spider_kwargs):
...
# scrapy/utils/spider.py
def iter_spider_classes(module):
"""Return an iterator over all spider classes defined in the given module
that can be instantiated (ie. which have name)
"""
...
答案 1 :(得分:1)
受@kev回答的启发,这里有一个检查蜘蛛类的函数:
from scrapy.utils.misc import walk_modules
from scrapy.utils.spider import iter_spider_classes
def _load_spiders(module='spiders.SomeSpider'):
for module in walk_modules(module):
for spcls in iter_spider_classes(module):
self._spiders[spcls.name] = spcls
然后你可以实例化:
somespider = self._spiders['spname']()