我想使用抓取分页和聚合块"每个广告网址块"
class FiregunsSpider(CrawlSpider):
name = 'centerfireguns'
allowed_domains = ['centerfireguns.com']
start_urls = ['http://www.centerfireguns.com/firearms.html']
rules = (
Rule(SgmlLinkExtractor(allow=(), restrict_xpaths=('//a[contains(@class, "i-next")][1]')), follow=True), #if you check start_urls this regex for pagination
rule(SgmlLinkExtractor(allow=(), restrict_xpaths=('//a[contains(@class,"product-image")]')), callback='parse_item', follow=True) #and this regex for aggregate block.
)
但只有第一条规则不会运行其他规则。