抓取抓取工具-我无法提取和跟踪data-url下的链接

时间:2019-01-22 00:04:36

标签: python-3.x scrapy-spider

我的蜘蛛遵循href链接,但不遵循数据URL链接

我有一个抓痒的爬行蜘蛛正在击中url https://www.grainger.com/category/tools/drills-and-drivers/standard-drills-and-drivers

我有一条规则,要遵循更多类别,并且在涉及产品时要遵循一项规则

我已经尝试限制产品示例的xpath(restrict_xpaths =('// li [@ data-url]')),但没有运气。

class GraingerSpider(CrawlSpider):
    name = 'grainger.com'
    allowed_domains = ['grainger.com']
    start_urls = [
        'https://www.grainger.com/category/tools/drills-and-drivers/standard-drills-and-drivers'  
    ]

    rules = (

        Rule(LinkExtractor(allow=('/category/tools/', ), deny=('/ecatalog/', ))),

        # Extract links matching 'item.php' and parse them with the spider's method parse_item
        Rule(LinkExtractor(allow=('/product/', ), attrs=('href','data-url',), restrict_xpaths=('//li[@data-url]')), callback='parse_item',),

    )

发生了什么事,蜘蛛会继续查找新的类别/工具页面,但永远不会找到data-url下的产品页面

0 个答案:

没有答案