Question

python3.6

有两个价格标签，其中有一个rel = nofollow，它输出相同的结果。：

 1.<span class="price" id="product-price-013">
                            $0.50                </span>

 2.   <div class="price-box" rel="nofollow">
     <span class="regular-price" id="product-price-01">
                                <span class="price">$76</span>                </span>                              
 3. name html <div class="oh h89"><a href="https://example.com/11.html" title="11">11</a></div>

我的代码：

def parse_product(self, response):
    for detail in response.xpath("//div[@class='oh h89']"):
        item = exampleItem()
        item['name'] = detail.xpath("a/text()")[0].extract()  #got different.i have tried add**//** in **//a**. it also caused same result.
        item['price'] = str((detail.xpath("//span[starts-with(@id, 'product-price-')]")).xpath('string(.)').extract()[0]).strip()  #got same result. i can't delete **//**, because there are **rel=nofollow** in the middle.
        yield item

编辑：

这将导致不同的名称，但价格相同。在我的电脑中，它看起来像这样：

31157P00, Version B
$75.99
30981P00, Version A
$75.99
710-050100-049
$75.99
8 Keys, B Stock
$75.99

我希望获得不同的结果。我为此尝试了两天，我觉得我很困惑。感谢。

Answer 1

您的XPATH //中有//span[starts-with(@id, 'product-price-')]，表示从整个文档中搜索。

将其更改为span[starts-with(@id, 'product-price-')]，这将相对搜索。

def parse_product(self, response):
    for detail in response.xpath("//div[@class='oh h89']"):
        item = exampleItem()
        item['name'] = detail.xpath("a/text()")[0].extract()  #got different result.
        item['price'] = str((detail.xpath("span[starts-with(@id, 'product-price-')]")).xpath('string(.)').extract()[0]).strip()  #got same result
        yield item

PS：

我不知道detail.xpath("a/text()")[0].extract()是如何工作的，我认为它应该是detail.xpath("a/text()").extract()[0]或detail.xpath("a/text()").extract_first()

如何scrapy避免rel = nofollow，它造成的回报什么都没有？

1 个答案: