我创建了一个蜘蛛,我想抓取一个网站并返回结果。当我在下面运行此脚本时,会产生错误:
traceback (most recent call last):
File "<ipython-input-1-e4ca99c5d3a6>", line 1, in <module>
runfile('/Users/xxxx/PythonScripts_conda/Scrapers/venmo_scraper/venmo_scraper/spiders/venmo_scrapy.py', wdir='/Users/pproctor/PythonScripts_conda/Scrapers/venmo_scraper/venmo_scraper/spiders')
File "/Users/xxxx/anaconda/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "/Users/xxxx/anaconda/lib/python3.5/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Users/xxxx/PythonScripts_conda/Scrapers/venmo_scraper/venmo_scraper/spiders/venmo_scrapy.py", line 58, in <module>
process.start()
File "/Users/xxxx/anaconda/lib/python3.5/site-packages/scrapy/crawler.py", line 280, in start
reactor.run(installSignalHandlers=False) # blocking call
File "/Users/xxxx/anaconda/lib/python3.5/site-packages/twisted/internet/base.py", line 1198, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "/Users/xxxx/anaconda/lib/python3.5/site-packages/twisted/internet/base.py", line 1178, in startRunning
ReactorBase.startRunning(self)
File "/Users/xxxx/anaconda/lib/python3.5/site-packages/twisted/internet/base.py", line 687, in startRunning
raise error.ReactorNotRestartable()
ReactorNotRestartable
import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
class VenmoSpider(scrapy.Spider):
name = "zasa"
def start_requests(self):
url = 'https://www.xxasd.html'
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
page = response.url.split("/")[-2]
filename = 'venmo-%s.html' % page
with open(filename, 'wb') as f:
f.write(response.body)
self.log('Saved file %s' % filename)
for ele in response.css('div'):
yield {
'participant': ele.css('a::text').extract_first(),
'tags': ele.css('div.tags a.tag::text').extract()
}
process = CrawlerProcess(get_project_settings())
process.crawl(VenmoSpider)
process.start()
我已按照scrapy上的文档进行操作,但未找到明确的解决方案。当我删除process.start()调用时,它会运行但不会抓取该网址。