python3.6
有两个价格标签,其中有一个rel = nofollow,它输出相同的结果。:
1.<span class="price" id="product-price-013">
$0.50 </span>
2. <div class="price-box" rel="nofollow">
<span class="regular-price" id="product-price-01">
<span class="price">$76</span> </span>
3. name html <div class="oh h89"><a href="https://example.com/11.html" title="11">11</a></div>
我的代码:
def parse_product(self, response):
for detail in response.xpath("//div[@class='oh h89']"):
item = exampleItem()
item['name'] = detail.xpath("a/text()")[0].extract() #got different.i have tried add**//** in **//a**. it also caused same result.
item['price'] = str((detail.xpath("//span[starts-with(@id, 'product-price-')]")).xpath('string(.)').extract()[0]).strip() #got same result. i can't delete **//**, because there are **rel=nofollow** in the middle.
yield item
编辑:
这将导致不同的名称,但价格相同。 在我的电脑中,它看起来像这样:
31157P00, Version B
$75.99
30981P00, Version A
$75.99
710-050100-049
$75.99
8 Keys, B Stock
$75.99
我希望获得不同的结果。 我为此尝试了两天,我觉得我很困惑。感谢。
答案 0 :(得分:0)
您的XPATH //
中有//span[starts-with(@id, 'product-price-')]
,表示从整个文档中搜索。
将其更改为span[starts-with(@id, 'product-price-')]
,这将相对搜索。
def parse_product(self, response):
for detail in response.xpath("//div[@class='oh h89']"):
item = exampleItem()
item['name'] = detail.xpath("a/text()")[0].extract() #got different result.
item['price'] = str((detail.xpath("span[starts-with(@id, 'product-price-')]")).xpath('string(.)').extract()[0]).strip() #got same result
yield item
PS:
我不知道detail.xpath("a/text()")[0].extract()
是如何工作的,我认为它应该是detail.xpath("a/text()").extract()[0]
或detail.xpath("a/text()").extract_first()