如何scrapy避免rel = nofollow,它造成的回报什么都没有?

时间:2018-02-05 13:17:17

标签: python python-3.x for-loop xpath scrapy

python3.6

有两个价格标签,其中有一个rel = nofollow,它输出相同的结果。:

 1.<span class="price" id="product-price-013">
                            $0.50                </span>

 2.   <div class="price-box" rel="nofollow">
     <span class="regular-price" id="product-price-01">
                                <span class="price">$76</span>                </span>                              
 3. name html <div class="oh h89"><a href="https://example.com/11.html" title="11">11</a></div>

我的代码:

def parse_product(self, response):
    for detail in response.xpath("//div[@class='oh h89']"):
        item = exampleItem()
        item['name'] = detail.xpath("a/text()")[0].extract()  #got different.i have tried add**//** in **//a**. it also caused same result.
        item['price'] = str((detail.xpath("//span[starts-with(@id, 'product-price-')]")).xpath('string(.)').extract()[0]).strip()  #got same result. i can't delete **//**, because there are **rel=nofollow** in the middle.
        yield item

编辑:

这将导致不同的名称,但价格相同。 在我的电脑中,它看起来像这样:

31157P00, Version B
$75.99
30981P00, Version A
$75.99
710-050100-049
$75.99
8 Keys, B Stock
$75.99

我希望获得不同的结果。 我为此尝试了两天,我觉得我很困惑。感谢。

1 个答案:

答案 0 :(得分:0)

您的XPATH //中有//span[starts-with(@id, 'product-price-')],表示从整个文档中搜索。

将其更改为span[starts-with(@id, 'product-price-')],这将相对搜索。

def parse_product(self, response):
    for detail in response.xpath("//div[@class='oh h89']"):
        item = exampleItem()
        item['name'] = detail.xpath("a/text()")[0].extract()  #got different result.
        item['price'] = str((detail.xpath("span[starts-with(@id, 'product-price-')]")).xpath('string(.)').extract()[0]).strip()  #got same result
        yield item

PS:

我不知道detail.xpath("a/text()")[0].extract()是如何工作的,我认为它应该是detail.xpath("a/text()").extract()[0]detail.xpath("a/text()").extract_first()