我有一个html,我试图用xpath解析。但我只会得到空洞的回报。任何人都可以告诉我我错在哪里。我已经尝试了一切,但无法成功。
标签的Xpath代码:
divLbl=ch.xpath("//div[@class='left-container']/article/ul[@class='list-unstyled row']/li[@class='col-sm-6 mrg-bottom']/span[@class='text-light']")
相应标签值的Xpath代码:
divVal=ch.xpath("//div[@class='left-container']/article/ul[@class='list-unstyled row']/li[@class='col-sm-6 mrg-bottom']/span[@class='text-light']/strong")
HTML值:
<div>
<h2 class="rowbreak"><strong>Information of the Car</strong></h2>
<ul class=" list-unstyled row">
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">Make Year:</span> <strong>Aug 2009</strong></li>
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-road text-light"></span> <span class=" text-light">Kilometers:</span> <strong>127,553</strong></li>
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-map-marker text-light"></span> <span class=" text-light">City:</span>
<strong class="carCity_795606">
<a href="javascript:void(0);" onclick="javascript: $( "#maplinkbtn" ).trigger( "click" ); ">
Sambalpur </a>
</strong>
</li>
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">Listing Date:</span> <strong>27 Apr 2015</strong></li>
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">No. of Owners:</span> <strong> First Owner</strong>
</li>
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-tint text-light"></span> <span class=" text-light">Fuel Type:</span> <strong> Petrol</strong></li>
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">Posted by:</span> <strong>
Dealer</strong>
</li>
</ul>
</div>
编辑HTML:
<div>
<h2 class="rowbreak"><strong>Information of the Car</strong></h2>
<ul class=" list-unstyled row">
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">Make Year:</span> <strong>Aug 2009</strong></li>
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-road text-light"></span> <span class=" text-light">Kilometers:</span> <strong>127,553</strong></li>
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-map-marker text-light"></span> <span class=" text-light">City:</span>
<strong class="carCity_795606">
<a href="javascript:void(0);" onclick="javascript: $( "#maplinkbtn" ).trigger( "click" ); ">
Sambalpur </a>
</strong>
</li>
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-calendar text-light"></span> <span class=" text-light">Listing Date:</span> <strong>27 Apr 2015</strong></li>
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">No. of Owners:</span> <strong> First Owner</strong>
</li>
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-tint text-light"></span> <span class=" text-light">Fuel Type:</span> <strong> Petrol</strong></li>
<li class="col-sm-4 mrg-bottom"><span class="glyphicon glyphicon-user text-light"></span> <span class=" text-light">Posted by:</span> <strong>
Dealer</strong>
</li>
</ul>
</div>
<h2 class="rowbreak"></h2>
<ul class=" list-unstyled row">
<li class="col-sm-6 mrg-bottom"><span class=" text-light">One Time Tax :</span> <strong>Individual</strong></li>
<li class="col-sm-6 mrg-bottom"><span class=" text-light">Registration No. :</span> <strong>OR03F3141</strong></li>
<li class="col-sm-6 mrg-bottom"><span class=" text-light"> Insurance & Expiry :</span> <strong>No Insurance </strong></li>
<li class="col-sm-6 mrg-bottom"><span class=" text-light">Registration Place: </span> <strong> Sambalpur</strong></li>
<li class="col-sm-6 mrg-bottom"><span class=" text-light">Transmission :</span> <strong>Manual</strong></li>
<li class="col-sm-6 mrg-bottom"><span class=" text-light">Color :</span> <strong>Silver</strong></li>
</ul>
答案 0 :(得分:3)
您当前使用的XPath是非常脆弱 - 您正在检查链中的每个元素并使用“面向布局”的类。
我将从包含h2
元素的strong
元素开始,并带有“汽车信息”文本,并获取以下ul
元素。例如。得到所有标签:
//h2[strong = 'Information of the Car']/following-sibling::ul/li/span/text()
演示:
In [3]: ch = fromstring(data)
In [4]: ch.xpath("//h2[strong = 'Information of the Car']/following-sibling::ul/li/span/text()")
['Make Year:', 'Kilometers:', 'City:', 'No. of Owners:', 'Fuel Type:', 'Posted by:']
示例(获取名称和值):
In [25]: for field in ch.xpath("//h2/following-sibling::ul/li"):
name = ''.join(field.xpath(".//span/text()")).strip()
value = ''.join(field.xpath(".//strong//text()")).strip()
print name, value
....:
Make Year: Aug 2009
Kilometers: 127,553
City: Sambalpur
Listing Date: 27 Apr 2015
No. of Owners: First Owner
Fuel Type: Petrol
Posted by: Dealer
One Time Tax : Individual
Registration No. : OR03F3141
Insurance & Expiry : No Insurance
Registration Place: Sambalpur
Transmission : Manual
Color : Silver