我尝试使用下面的蜘蛛进行抓取,但它并没有打电话给回叫'功能。我的蜘蛛:
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class ScreenerSpider(CrawlSpider):
name = 'screener'
allowed_domains = ['finviz.com']
start_urls = ['https://finviz.com/screener.ashx']
rules = [
Rule(LinkExtractor(allow=['https://finviz.com/screener.ashx?v=111&r=[0-9]{2}']),
callback='parse_screener', follow=True)
]
def parse_screener(self, response):
self.logger.warning('lalala')
当我运行这个蜘蛛时,它不会打印出lalala'在终端中,即没有打电话给parse_screener'功能。我写了这个蜘蛛,正如文档中所示。问题是什么?
答案 0 :(得分:2)
问题是你的允许条款。这是正则表达式,所以你必须逃避特殊符号,如'?'。这工作正常(反斜杠之前?):
Rule(LinkExtractor(allow=['https://finviz.com/screener.ashx\?v=111&r=[0-9]{2}']), callback='parse_screener', follow=True)