Question

我的蜘蛛遵循href链接，但不遵循数据URL链接

我有一个抓痒的爬行蜘蛛正在击中url https://www.grainger.com/category/tools/drills-and-drivers/standard-drills-and-drivers

我有一条规则，要遵循更多类别，并且在涉及产品时要遵循一项规则

我已经尝试限制产品示例的xpath（restrict_xpaths =（'// li [@ data-url]'）），但没有运气。

class GraingerSpider(CrawlSpider):
    name = 'grainger.com'
    allowed_domains = ['grainger.com']
    start_urls = [
        'https://www.grainger.com/category/tools/drills-and-drivers/standard-drills-and-drivers'  
    ]

    rules = (

        Rule(LinkExtractor(allow=('/category/tools/', ), deny=('/ecatalog/', ))),

        # Extract links matching 'item.php' and parse them with the spider's method parse_item
        Rule(LinkExtractor(allow=('/product/', ), attrs=('href','data-url',), restrict_xpaths=('//li[@data-url]')), callback='parse_item',),

    )

发生了什么事，蜘蛛会继续查找新的类别/工具页面，但永远不会找到data-url下的产品页面

抓取抓取工具-我无法提取和跟踪data-url下的链接

0 个答案: