如何使用重复的类ID有选择地刮取html

时间:2015-12-04 15:55:07

标签: python html xpath scraperwiki

我是python的新手,并且徒劳地搜索stackoverflow以获得我能理解的答案。提前感谢您提供任何帮助或建议。

我正试图从房屋销售网站上搜集价格和位置的信息,即包含'字段内容的信息。标签。

问题是该网页有很多字段内容'标签和我正在尝试的主要代码拉出并打印出看似随机的选择。

提前感谢您的帮助。

这就是我要抓的东西:

<div class="view-content">
<div class="views-row views-row-1 views-row-odd views-row-first views-row-last">
        <div class="views-field views-field-field-summary">        
<div class="field-content">
Land for sale in Prestatyn, Flintshire. Three acres of land with outline planning permission for three large, 4 bedroomed detached houses.
</div> 
 </div>  
         <div class="views-field views-field-field-price">    
<span class="views-label views-label-field-price">PRICE: </span>   
 <span class="field-content">£297,500</span>  
</div>  

这是我试图让它给我回价的基本尝试。 Haven已经走得很远了,而不只是为了刮价而将它保存到刮板维基表上还有很长的路要走!

#!/usr/bin/env python

from lxml import html
import requests

page = requests.get('http://www.plotfinder.net/plot/plot-jaslin')
tree = html.fromstring(page.content)

Type1 = tree.xpath('//span[@class="views-label views-label-field-price"]/text()')
price = tree.xpath('//span[@class="field-content"]/text()')

print 'Type1: ', Type1
print 'price: ', price

1 个答案:

答案 0 :(得分:0)

你可以试试这个

from lxml import html
import requests

page = requests.get('http://www.plotfinder.net/plot/plot-jaslin')
tree = html.fromstring(page.content)

Type1 = tree.xpath('//span[contains(@class,"field-price"]/text()')
price = tree.xpath('//span[contains(@class,"field-price")]/following-sibling::span[contains(@class,"field-content")][1]/text()')


print 'Type1: ', Type1
print 'price: ', price

希望你能得到你想要的结果。