如何在Scrapy中使用逻辑OR?

时间:2018-09-30 10:43:06

标签: python scrapy logical-operators rules

我想在蜘蛛网中使用两个规则,并使它们成为逻辑或(||)。

代码如下:

for urlrule in urlrules:
    if urlrule['rule'] is not 'nan':
        allSpider.rules = [Rule(LinkExtractor(allow=(urlrule['rule'],), ), callback="parse_items", follow=True)]
    elif urlrule['restrictXP'] is not 'nan':
        allSpider.rules = [Rule(LinkExtractor(restrict_xpaths=urlrule['restrictXP']), callback='parse_items', follow=True)]
    else:
        print('Undefined Rule!')
        break

if urlrule['rule'] is not 'nan'这部分是在csv文件上读取的。

但是有一个问题,仅检查if的第一部分。当我运行它时,它返回以下内容:

Unhandled error in Deferred:
2018-09-30 13:18:58 [twisted] CRITICAL: 
Unhandled error in Deferred:

2018-09-30 13:18:58 [twisted] CRITICAL: 
Traceback (most recent call last):
File "/home/reyhaneh/.local/lib/python2.7/site-         packages/twisted/internet/defer.py", line 1386, in   _inlineCallbacks
result = g.send(result)
File "/home/reyhaneh/.local/lib/python2.7   /site-packages/scrapy/crawler.py", line 98, in crawl
six.reraise(*exc_info)
File "/home/reyhaneh/.local/lib/python2.7/site-   packages/scrapy/crawler.py", line 79, in crawl
self.spider = self._create_spider(*args,    **kwargs)
File "/home/reyhaneh/.local/lib/python2.7/site-   packages/scrapy/crawler.py", line 102, in   _create_spider
return self.spidercls.from_crawler(self,    *args, **kwargs)
File "/home/reyhaneh/.local/lib/python2.7   /site-packages/scrapy/spiders/crawl.py", line 100,     in from_crawler
spider = super(CrawlSpider,    cls).from_crawler(crawler, *args, **kwargs)
File "/home/reyhaneh/.local/lib/python2.7  /site-packages/scrapy/spiders/__init__.py", line 51,    in from_crawler
spider = cls(*args, **kwargs)
File "/home/reyhaneh/PycharmProjects/total  /total.py", line 25, in __init__
allSpider.rules = [Rule(LinkExtractor(allow=   (urlrule['rule'],), ), callback="parse_items",    follow=True)]
File "/home/reyhaneh/.local/lib/python2.7  /site-packages/scrapy/linkextractors/lxmlhtml.py",    line 116, in __init__
canonicalize=canonicalize,     deny_extensions=deny_extensions)
File "/home/reyhaneh/.local/lib/python2.7/site-packages/scrapy/linkextractors/__init__.py",    line 57, in __init__
for x in arg_to_iter(allow)]
File "/usr/lib/python2.7/re.py", line 194, in   compile
return _compile(pattern, flags)
File "/usr/lib/python2.7/re.py", line 247, in _compile
raise TypeError, "first argument must be string or compiled pattern"
TypeError: first argument must be string or compiled pattern

我该如何解决?

0 个答案:

没有答案