使用scrapy / xpath提取货币值

时间:2013-04-16 10:01:29

标签: scrapy

尝试使用scrapy.Code从某些html获取货币值

links = hxs.select('//a[@class="product-image"]/div[@class="price-box"]//span[@class="price"]/text()').extract()')

和HTML

<div>
  <span>
    <sub>
      <li class="item first">

        <a href="http://www.xtra-vision.ie/dvd-blu-ray/to-rent/new-release/dvd/pitch-perfect-dvd.html" title="Image for Pitch Perfect" class="product-image">

          <span class="exclusive-star">
          </span>

          <img src="http://www.xtra-vision.ie/media/catalog/product/cache/3/small_image/124x173/5b02ab93946615b958c913185aae2414/i/w/iws_5167c10c906b57.33524324.JPG.jpg"  alt="Image for Pitch Perfect" />

          <h2 class="product-name">Pitch Perfect</h2>

          <div class="price-box">

            <span class="regular-price" id="product-price-5174">

              <span class="price">
                €15                     
                <sub class="price-bit">.99</sub>
              </span>
            </span>
          </div>
        </a>
      </li>
    </sub>

  </span>

</div>

我得到的价格是\ u20ac15 \ t \ t \ t \ t \ t \ t 有什么方法可以使用xpath

从这个html中提取15.99

1 个答案:

答案 0 :(得分:0)

我使用了xpath和Python的组合,所以可能不是你所追求的,尽管这主要用于摆脱添加到“价格”末尾的无关标签。

price = hxs.select('//span[@class="price"]/text()').extract()
pricebit = hxs.select('//span[@class="price"]/sub[@class="price-bit"]/text()').extract()
totalprice = price + price-bit
totalstr = ''.join(totalprice).replace('\t','')