从脚本反应器调用多个蜘蛛不停止

时间:2013-10-29 15:28:56

标签: python python-2.7 web-scraping web-crawler scrapy

我写过这个脚本来调用多个蜘蛛。它适用于单个蜘蛛,但不适用于多个蜘蛛。我是Scrapy的新手。

import sys
import os
c=os.getcwd()
os.chdir("myweb")
d=os.getcwd()
os.chdir(c)
sys.path.insert(0, d)
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'myweb.settings'
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import CrawlerSettings
from scrapy import log, signals
from scrapy.xlib.pydispatch import dispatcher
def stop_reactor():
        reactor.stop()
def setup_crawler(spider_name):
    crawler = Crawler(CrawlerSettings())
    crawler.configure()
    crawler.crawl(spider_name)
    dispatcher.connect(stop_reactor, signal=signals.spider_closed)
    crawler.start()

log.start(loglevel=log.DEBUG)
crawler = Crawler(CrawlerSettings())
crawler.configure()

from aqaq.aqaq.spiders.spider import aqaqspider
spider = aqaqspider(domain='aqaq.com')
setup_crawler(spider)
from aqaq.aqaq.spiders.spider2 import DmozSpider
spider=DmozSpider(domain='shoptiques.com')
setup_crawler(spider)
result = reactor.run()
print result
log.msg("------------>Running stoped")

此外,当一只蜘蛛在运行时,另一只蜘蛛开始跑步,当一只蜘蛛停止整只蜘蛛停止时。

1 个答案:

答案 0 :(得分:0)

如果您尝试设置相同的抓取工具来访问多个域,请在此处详细记录:http://doc.scrapy.org/en/latest/topics/practices.html#running-multiple-spiders-in-the-same-process

否则,删除此行可能会解决此问题:

dispatcher.connect(stop_reactor, signal=signals.spider_closed)

当一只蜘蛛关闭时,它会停止整个反应堆,不允许第二只蜘蛛完成。