我正在尝试将值动态传递给我的Spider以便抓取页面。当我执行它显示错误,并且不会继续。让我知道我的代码有什么问题。
class MyItem(Item):
url=Field()
title=Field()
class someSpider(CrawlSpider):
name = 'crawlim'
sitename=''
items=[]
def __init__(self, *args, **kwargs):
urls = kwargs.pop('urls', [])
domains=kwargs.pop('domains',[])
if domains:
self.allowed_domains=domains.split(',')
if urls:
self.start_urls = urls.split(',')
self.logger.info(self.start_urls)
someSpider.rules=(Rule(LxmlLinkExtractor(allow=(),unique=True), callback='parse_obj', follow=True),)
super(someSpider, self).__init__(*args, **kwargs)
错误消息:
INFO:scrapy.core.engine:Spider opened
2019-03-07 16:59:14 [scrapy.core.engine] INFO: Spider opened
INFO:scrapy.extensions.logstats:Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-03-07 16:59:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
INFO:crawlim:Spider opened: crawlim
2019-03-07 16:59:14 [crawlim] INFO: Spider opened: crawlim
DEBUG:scrapy.extensions.telnet:Telnet console listening on 127.0.0.1:6063
2019-03-07 16:59:14 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6063
error