Question

尝试从页面中提取产品名称：

https://www.v12outdoor.com/view-by-category/rock-climbing-gear/rock-climbing-shoes/mens.html

找不到返回有用的特定结果的XPATH。

我的第一篇文章道歉是这样的初学者问题：（

class V12Spider(scrapy.Spider):
name = 'v12'
start_urls = ['https://www.v12outdoor.com/view-by-category/rock-climbing-gear/rock-climbing-shoes/mens.html']


def parse(self, response):
    yield {
        'price' : response.xpath('//span[@id="product-price-26901"]/text()'),
        'name' : response.xpath('//h3[@class="product-name"]/a/text()'),
           }

对于name，我希望使用类h3的{{1}}标记中的项目产生名称，但会生成多行数据='\ r \ n

（虽然我们对product-name感兴趣，有什么方法可以只提取数值吗？）

Answer 1

可以使用xpath的get（）方法，然后使用string的strip（）方法解决您面临的问题。我尝试过这样的事情

name= response.xpath('//h3[@class="product-name"]/a/text()').get()

给予

'\r\n                                RED CHILLI VOLTAGE                            '

然后使用

name.strip()

给予

'RED CHILLI VOLTAGE'

因此您可以将您的姓名声明替换为

name= response.xpath('//h3[@class="product-name"]/a/text()').get().strip()

使用相同的方法来获取价格，只需在声明的末尾添加.get（）。strip

希望这会有所帮助。还可以从https://docs.scrapy.org/en/latest/topics/selectors.html

中了解有关.getall（）方法的信息。

在SCRAPY中形成XPATH选择器

1 个答案: