我正在尝试将python的Scrapy库与IBM云功能一起使用。我想用process.crawl
传递一些参数。我该怎么办?
我的代码如下:
class MySpider(scrapy.Spider):
name = "quotes"
start_urls = ["http://quotes.toscrape.com/"]
def __init__(self, make=None, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
init_url = "http://quotes.toscrape.com/"
self.start_urls = [init_url]
def parse(self, response):
title = response.css(".header-box > div a::text").extract_first()
yield {"title": title}
process = CrawlerProcess({'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'})
process.crawl(MySpider) <-------- Explanation
process.start()
说明
我发现here可以按以下步骤完成:
process.crawl(MySpider, make="Audi")
但是当我尝试这样做时,我的编辑器出现错误:
expected type 'dict' got 'str' instead
我在做什么错了?
更新
我将scrapy spider用于IBM云功能,因此我的代码如下:
import scrapy
from scrapy.crawler import CrawlerProcess
class MySpider(scrapy.Spider):
name = "quotes"
start_urls = ["http://quotes.toscrape.com/"]
def __init__(self, make=None, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
print("Make {}".format(make))
def parse(self, response):
title = response.css(".header-box > div a::text").extract_first()
yield {"title": title}
def main(params):
process = CrawlerProcess({'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'})
process.crawl(MySpider, make="Audi") <------- in my editor I get here an warning expected type 'dict' got 'str' instead
process.start()
return {"joke": "Some shit joke"}
当我从控制台运行main({})
时,出现以下错误:
2018-06-22 08:42:45 [scrapy.extensions.telnet]调试:Telnet控制台 侦听127.0.0.1:6024追溯(最近一次通话为最后一次):文件 “”,文件“ ./ 主要 .py”的第1行,第30行, 主文件 “ /Users/boris/Projects/IBM-cloud/virtualenv/lib/python3.6/site-packages/scrapy/crawler.py”, 第291行,开始时 Reactor.run(installSignalHandlers = False)#阻止调用文件“ /Users/boris/Projects/IBM-cloud/virtualenv/lib/python3.6/site-packages/twisted/internet/base.py”, 1260行,正在运行 self.startRunning(installSignalHandlers = installSignalHandlers)文件 “ /Users/boris/Projects/IBM-cloud/virtualenv/lib/python3.6/site-packages/twisted/internet/base.py”, 第1240行,在startRunning中 ReactorBase.startRunning(self)文件“ /Users/boris/Projects/IBM-cloud/virtualenv/lib/python3.6/site-packages/twisted/internet/base.py”, 第748行,在startRunning中 引发错误.ReactorNotRestartable()twisted.internet.error.ReactorNotRestartable