我的蜘蛛遵循href链接,但不遵循数据URL链接
我有一个抓痒的爬行蜘蛛正在击中url https://www.grainger.com/category/tools/drills-and-drivers/standard-drills-and-drivers
我有一条规则,要遵循更多类别,并且在涉及产品时要遵循一项规则
我已经尝试限制产品示例的xpath(restrict_xpaths =('// li [@ data-url]')),但没有运气。
class GraingerSpider(CrawlSpider):
name = 'grainger.com'
allowed_domains = ['grainger.com']
start_urls = [
'https://www.grainger.com/category/tools/drills-and-drivers/standard-drills-and-drivers'
]
rules = (
Rule(LinkExtractor(allow=('/category/tools/', ), deny=('/ecatalog/', ))),
# Extract links matching 'item.php' and parse them with the spider's method parse_item
Rule(LinkExtractor(allow=('/product/', ), attrs=('href','data-url',), restrict_xpaths=('//li[@data-url]')), callback='parse_item',),
)
发生了什么事,蜘蛛会继续查找新的类别/工具页面,但永远不会找到data-url下的产品页面