Question

我有一个非常简单的Scrapy foo(*students['John'])，我给它一个简单的规则＆＃34;抓取/关注包含＆＃39; / search / listings＆＃39;＆＃34;的任何链接。但蜘蛛没有爬行/跟踪任何这些链接？

我已确认起始网址包含许多与href＆＃39; / search / listings＆＃39;所以链接就在那里。

任何想法都会出错？

CrawlSpider

开始网址＆＃34; http://www.mywebsite.com/results＆＃34;包含我希望规则适用于的这些链接：

class MySpider(CrawlSpider):

    name = "MySpider"
    allowed_domains = ["mywebsite.com"]
    start_urls = ["http://www.mywebsite.com/results"]
    rules = [Rule(LinkExtractor(allow=['/search/listings(.*)']), callback="parse2")]

    def parse2(self, response):

        # This function is never called
        log.start("log.txt")
        log.msg("Page crawled: " + response.url)

Scrapy简单规则不遵循链接

0 个答案: