Scrapy:如何在嵌套div(xpath选择器)中提取内容?

时间:2017-04-02 20:35:18

标签: python xpath scrapy web-crawler

请参阅以下html标记。如何在Scrapy中使用xpath选择器从div中的 col-sm-7 类名中提取内容?

我想提取这段文字:

  

Infortrend EonNAS Pro 850X 8-bay Tower NAS,10GbE

HTML:

<div class="pricing panel panel-primary">
   <div class="panel-heading">Infortrend Products</div>
   <div class="body">
    <div class="panel-subheading"><strong>EonNAS Pro Models</strong></div>
    <div class="row">
     <div class="col-sm-7"><strong>Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE</strong><br />
      <small>Intel Core i3 Dual-Core 3.3GHz Processor, 8GB DDR3 RAM (Drives Not Included)</small></div>
     <div class="col-sm-3">#ENP8502MD-0030<br />
      <strong> Our Price: $2,873.00</strong></div>
     <div class="col-sm-2">
      <form action="/addcart.asp" method="get">
       <input type="hidden" name="item" value="ENP8502MD-0030 - Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE (Drives Not Included)">
       <input type="hidden" name="price" value="$2873.00">
       <input type="hidden" name="custID" value="">
       <input type="hidden" name="quantity" value="1">
       <button type="submit" class="btn btn-primary center-block"><i class="fa fa-shopping-cart"></i> Add to Cart</button>
      </form>
     </div>
    </div>
   </div>
  </div>

我尝试使用此命令,但它不起作用:

response.xpath('//div[@class="pricing panel panel-primary"]/div[@class="panel-heading"]/text()/div[@class="body"]//div[@class="panel-subheading" and contains(@style,'font-weight:bold')]/text()').extract_first()

2 个答案:

答案 0 :(得分:1)

试试这个:

response.xpath('//*[@class="col-sm-7"]//strong//text()').extract()

希望有所帮助:)

答案 1 :(得分:1)

您可以在<strong>元素之间获取文本,如下所示:

print(response.xpath('//div[@class="col-sm-7"]//text()').extract()[0].strip())

print(response.xpath('//div[@class="col-sm-7"]/strong/text()').extract()[0].strip())

上述两个陈述都会导致:

Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE

您可以使用//text()获取此div中所有元素之间的文本,包括元素内的<strong><small>标记,如下所示:

elem_text = ' '.join([txt.strip() for txt in response.xpath('//div[@class="col-sm-7"]//text()').extract()])
print(elem_text)

这将导致:

Infortrend EonNAS Pro 850X 8-bay Tower NAS with 10GbE  Intel Core i3 Dual-Core 3.3GHz Processor, 8GB DDR3 RAM (Drives Not Included)