我应该在相同的规则中抓取聚合块和分页

时间:2015-12-25 11:28:56

标签: python scrapy

我想使用抓取分页和聚合块"每个广告网址块"

class FiregunsSpider(CrawlSpider):
    name = 'centerfireguns'
    allowed_domains = ['centerfireguns.com']
    start_urls = ['http://www.centerfireguns.com/firearms.html']

    rules = (
        Rule(SgmlLinkExtractor(allow=(), restrict_xpaths=('//a[contains(@class, "i-next")][1]')), follow=True), #if you check start_urls this regex for pagination 
        rule(SgmlLinkExtractor(allow=(), restrict_xpaths=('//a[contains(@class,"product-image")]')), callback='parse_item', follow=True) #and this regex for aggregate block.
    )

但只有第一条规则不会运行其他规则。

0 个答案:

没有答案