几个月前我想在Djanjo运行蜘蛛时遇到了麻烦。这种方法对我有用:
def crawllist(self,lists):
runner = CrawlerRunner(get_project_settings())
for topic in lists:
logging.error("topic name is %s" % topic.name)
runner.crawl(topic.type,author = topic.author,links = topic.base_url)
d = runner.join()
d.addBoth(lambda _: reactor.stop())
logging.error("start crawl")
reactor.run(installSignalHandlers=False)
但是现在不起作用。得到这些错误:
Internal Server Error: /CreateTopicServlet
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/exception.py", line 34, in inner
response = get_response(request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 126, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/usr/local/lib/python3.5/dist-packages/django/core/handlers/base.py", line 124, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/home/zdc/Push/job/views.py", line 150, in CreateTopicServlet
sp.crawllist([item])
File "/home/zdc/Push/job/SpiderManager.py", line 59, in crawllist
reactor.run(installSignalHandlers=False)
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 1260, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 1240, in startRunning
ReactorBase.startRunning(self)
File "/usr/local/lib/python3.5/dist-packages/twisted/internet/base.py", line 746, in startRunning
raise error.ReactorAlreadyRunning()
我已经阅读了有关它的所有答案。但是对我不起作用。当我在没有djanjo的情况下本地运行时,spider可以成功运行。但是遇到了无法在Djanjo中重启的Reactor。 我尝试过这样的方法
def crawl(self,type,url,author):
print('crawl11')
module_name="Spidermanager.spiders.{}".format(type+'spider')
scrapy_var = importlib.import_module(module_name) #do some dynamic import of selected spider
spiderObj=scrapy_var.zhihuSpider(author = author,links = url)
print(spiderObj.start_urls)
runner = CrawlerRunner(get_project_settings())
runner.crawl(spiderObj)
print('crawl finished')
它解决了反应堆的问题。但是蜘蛛似乎没有运行并且没有爬行。