我想在蜘蛛网中使用两个规则,并使它们成为逻辑或(||)。
代码如下:
for urlrule in urlrules:
if urlrule['rule'] is not 'nan':
allSpider.rules = [Rule(LinkExtractor(allow=(urlrule['rule'],), ), callback="parse_items", follow=True)]
elif urlrule['restrictXP'] is not 'nan':
allSpider.rules = [Rule(LinkExtractor(restrict_xpaths=urlrule['restrictXP']), callback='parse_items', follow=True)]
else:
print('Undefined Rule!')
break
if urlrule['rule'] is not 'nan'
这部分是在csv文件上读取的。
但是有一个问题,仅检查if
的第一部分。当我运行它时,它返回以下内容:
Unhandled error in Deferred:
2018-09-30 13:18:58 [twisted] CRITICAL:
Unhandled error in Deferred:
2018-09-30 13:18:58 [twisted] CRITICAL:
Traceback (most recent call last):
File "/home/reyhaneh/.local/lib/python2.7/site- packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/home/reyhaneh/.local/lib/python2.7 /site-packages/scrapy/crawler.py", line 98, in crawl
six.reraise(*exc_info)
File "/home/reyhaneh/.local/lib/python2.7/site- packages/scrapy/crawler.py", line 79, in crawl
self.spider = self._create_spider(*args, **kwargs)
File "/home/reyhaneh/.local/lib/python2.7/site- packages/scrapy/crawler.py", line 102, in _create_spider
return self.spidercls.from_crawler(self, *args, **kwargs)
File "/home/reyhaneh/.local/lib/python2.7 /site-packages/scrapy/spiders/crawl.py", line 100, in from_crawler
spider = super(CrawlSpider, cls).from_crawler(crawler, *args, **kwargs)
File "/home/reyhaneh/.local/lib/python2.7 /site-packages/scrapy/spiders/__init__.py", line 51, in from_crawler
spider = cls(*args, **kwargs)
File "/home/reyhaneh/PycharmProjects/total /total.py", line 25, in __init__
allSpider.rules = [Rule(LinkExtractor(allow= (urlrule['rule'],), ), callback="parse_items", follow=True)]
File "/home/reyhaneh/.local/lib/python2.7 /site-packages/scrapy/linkextractors/lxmlhtml.py", line 116, in __init__
canonicalize=canonicalize, deny_extensions=deny_extensions)
File "/home/reyhaneh/.local/lib/python2.7/site-packages/scrapy/linkextractors/__init__.py", line 57, in __init__
for x in arg_to_iter(allow)]
File "/usr/lib/python2.7/re.py", line 194, in compile
return _compile(pattern, flags)
File "/usr/lib/python2.7/re.py", line 247, in _compile
raise TypeError, "first argument must be string or compiled pattern"
TypeError: first argument must be string or compiled pattern
我该如何解决?