尝试使用scrapy.Code从某些html获取货币值
links = hxs.select('//a[@class="product-image"]/div[@class="price-box"]//span[@class="price"]/text()').extract()')
和HTML
<div>
<span>
<sub>
<li class="item first">
<a href="http://www.xtra-vision.ie/dvd-blu-ray/to-rent/new-release/dvd/pitch-perfect-dvd.html" title="Image for Pitch Perfect" class="product-image">
<span class="exclusive-star">
</span>
<img src="http://www.xtra-vision.ie/media/catalog/product/cache/3/small_image/124x173/5b02ab93946615b958c913185aae2414/i/w/iws_5167c10c906b57.33524324.JPG.jpg" alt="Image for Pitch Perfect" />
<h2 class="product-name">Pitch Perfect</h2>
<div class="price-box">
<span class="regular-price" id="product-price-5174">
<span class="price">
€15
<sub class="price-bit">.99</sub>
</span>
</span>
</div>
</a>
</li>
</sub>
</span>
</div>
我得到的价格是\ u20ac15 \ t \ t \ t \ t \ t \ t 有什么方法可以使用xpath
从这个html中提取15.99答案 0 :(得分:0)
我使用了xpath和Python的组合,所以可能不是你所追求的,尽管这主要用于摆脱添加到“价格”末尾的无关标签。
price = hxs.select('//span[@class="price"]/text()').extract()
pricebit = hxs.select('//span[@class="price"]/sub[@class="price-bit"]/text()').extract()
totalprice = price + price-bit
totalstr = ''.join(totalprice).replace('\t','')