我想访问页面https://whitney.org/exhibitions/2012-biennial上包含/ 2012-biennial /
的所有链接我尝试过
class SpiderSpider(CrawlSpider):
name = "whit"
start_urls = ['https://whitney.org/exhibitions/2012-biennial']
rules = [Rule(LinkExtractor(allow='2012-biennial/.*'), callback='parse', follow=True)]
,它只是解析起始网址。我想遵循并解析一堆类似于/ 2012-biennial / some-artist的链接。我在这里检查了堆栈溢出一堆,我不知道我在弄错什么,这似乎是最简单的事情。谢谢