我有一个非常简单的Scrapy foo(*students['John'])
,我给它一个简单的规则"抓取/关注包含' / search / listings'"的任何链接。但蜘蛛没有爬行/跟踪任何这些链接?
我已确认起始网址包含许多与href' / search / listings'所以链接就在那里。
任何想法都会出错?
CrawlSpider
开始网址" http://www.mywebsite.com/results"包含我希望规则适用于的这些链接:
class MySpider(CrawlSpider):
name = "MySpider"
allowed_domains = ["mywebsite.com"]
start_urls = ["http://www.mywebsite.com/results"]
rules = [Rule(LinkExtractor(allow=['/search/listings(.*)']), callback="parse2")]
def parse2(self, response):
# This function is never called
log.start("log.txt")
log.msg("Page crawled: " + response.url)