Question

我是scrapy的新手，正在尝试使用shell，尝试从以下网址中检索产品：https://www.newbalance.co.nz/men/shoes/running/trail/?prefn1=sizeRefinement&prefv1=men_shoes_7

以下是我的代码 - 我不确定为什么最终查询会返回空白：

$ scrapy shell

fetch("https://www.newbalance.co.nz/men/shoes/running/trail/?prefn1=sizeRefinement&prefv1=men_shoes_7")
div_product_lists = response.xpath('//div[@id="product-lists"]')
ul_product_list_main = div_product_lists.xpath('//ul[@id="product-list-main"]')
for li_tile in ul_product_list_main.xpath('//li[@class="tile"]'):
...    print li_tile.xpath('//div[@class="product"]').extract()
...
[]
[]

如果我使用属性检查器检查页面，那么我看到div的数据（类product），所以我不确定为什么这会回来为空。任何帮助将不胜感激。

Answer 1

您想要提取的数据更容易在具有类product-top-spacer的其他div中使用。

例如，您可以通过以下方式获取所有具有class="product-top-spacer"的div：

ts = response.xpath('//div[@class="product-top-spacer"]')

并检查第一个提取的div及其价格的项目：

ts[0].xpath('descendant::p[@class="product-name"]/a/text()').extract()[0]
>> 'Leadville v3'

ts[0].xpath('descendant::div[@class="product-pricing"]/text()').extract()[0].strip()
>> '$260.00'

可以通过迭代ts

查看所有项目

for t in ts:
    itname = t.xpath('descendant::p[@class="product-name"]/a/text()').extract()[0]
    itprice = t.xpath('descendant::div[@class="product-pricing"]/text()').extract()[0].strip()
    itprice = ' '.join(itprice.split()) # some cleaning
    print(itname + ", " + itprice)

Answer 2

这里的问题是xpath并不理解class="product product-tile "表示＆＃34;这个元素有2个类，product和product-tile＆＃34;。
在xpath选择器中，class属性只是一个字符串，就像任何其他字符串一样。

知道这一点，你可以搜索整个类字符串：

>>> li_tile.xpath('.//div[@class="product product-tile "]')
[<Selector xpath='.//div[@class="product product-tile "]' data='<div class="product product-tile " tabin'>]

如果您想查找具有＆＃34;产品＆＃34;的所有元素。 class，最简单的方法是使用css选择器：

>>> li_tile.css('div.product')
[<Selector xpath="descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' product ')]" data='<div class="product product-tile " tabin'>]

通过查看生成的Selector可以看到，仅使用xpath实现这一点要复杂一些。

不确定为什么xpath查询返回空

2 个答案: