Question

我创建了一个蜘蛛，我想抓取一个网站并返回结果。当我在下面运行此脚本时，会产生错误：

traceback (most recent call last):

File "<ipython-input-1-e4ca99c5d3a6>", line 1, in <module>
runfile('/Users/xxxx/PythonScripts_conda/Scrapers/venmo_scraper/venmo_scraper/spiders/venmo_scrapy.py', wdir='/Users/pproctor/PythonScripts_conda/Scrapers/venmo_scraper/venmo_scraper/spiders')

File "/Users/xxxx/anaconda/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 866, in runfile
execfile(filename, namespace)

File "/Users/xxxx/anaconda/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "/Users/xxxx/PythonScripts_conda/Scrapers/venmo_scraper/venmo_scraper/spiders/venmo_scrapy.py", line 58, in <module>
process.start()

File "/Users/xxxx/anaconda/lib/python3.5/site-packages/scrapy/crawler.py", line 280, in start
reactor.run(installSignalHandlers=False)  # blocking call

File "/Users/xxxx/anaconda/lib/python3.5/site-packages/twisted/internet/base.py", line 1198, in run
self.startRunning(installSignalHandlers=installSignalHandlers)

File "/Users/xxxx/anaconda/lib/python3.5/site-packages/twisted/internet/base.py", line 1178, in startRunning
ReactorBase.startRunning(self)

File "/Users/xxxx/anaconda/lib/python3.5/site-packages/twisted/internet/base.py", line 687, in startRunning
raise error.ReactorNotRestartable()

ReactorNotRestartable

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings


class VenmoSpider(scrapy.Spider):
    name = "zasa"

    def start_requests(self):
        url = 'https://www.xxasd.html'

        yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'venmo-%s.html' % page
        with open(filename, 'wb') as f:
          f.write(response.body)
          self.log('Saved file %s' % filename)

        for ele in response.css('div'): 
            yield {
             'participant': ele.css('a::text').extract_first(),
             'tags': ele.css('div.tags a.tag::text').extract()
             }

process = CrawlerProcess(get_project_settings())
process.crawl(VenmoSpider)
process.start()

我已按照scrapy上的文档进行操作，但未找到明确的解决方案。当我删除process.start（）调用时，它会运行但不会抓取该网址。

Scrapy Reactor无法重启错误

0 个答案: