Question

几个月前我想在Djanjo运行蜘蛛时遇到了麻烦。这种方法对我有用：

    def crawllist(self,lists):
    runner = CrawlerRunner(get_project_settings())
    for topic in lists:
        logging.error("topic name is %s" % topic.name)
        runner.crawl(topic.type,author = topic.author,links = topic.base_url)
    d = runner.join()
    d.addBoth(lambda _: reactor.stop())
    logging.error("start crawl")
    reactor.run(installSignalHandlers=False)

但是现在不起作用。得到这些错误：

    Internal Server Error: /CreateTopicServlet
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 34, in inner
  response = get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 126, in _get_response
  response = self.process_exception_by_middleware(e, request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 124, in _get_response
  response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/home/zdc/Push/job/views.py", line 150, in CreateTopicServlet
  sp.crawllist([item])
File "/home/zdc/Push/job/SpiderManager.py", line 59, in crawllist
  reactor.run(installSignalHandlers=False)
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 1260, in run
  self.startRunning(installSignalHandlers=installSignalHandlers)
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 1240, in startRunning
  ReactorBase.startRunning(self)
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 746, in startRunning
  raise error.ReactorAlreadyRunning()

我已经阅读了有关它的所有答案。但是对我不起作用。当我在没有djanjo的情况下本地运行时，spider可以成功运行。但是遇到了无法在Djanjo中重启的Reactor。我尝试过这样的方法

    def crawl(self,type,url,author):
    print('crawl11')
    module_name="Spidermanager.spiders.{}".format(type+'spider')
    scrapy_var = importlib.import_module(module_name)   #do some dynamic import of selected spider   
    spiderObj=scrapy_var.zhihuSpider(author = author,links = url)
    print(spiderObj.start_urls)
    runner = CrawlerRunner(get_project_settings())
    runner.crawl(spiderObj)
    print('crawl finished')

它解决了反应堆的问题。但是蜘蛛似乎没有运行并且没有爬行。

Scrapy-Reactor无法在Django中重新启动

0 个答案: