我正在编写一个简单的python web爬虫。我尝试使用Xpath过滤页面。这是目标页面的一部分。
<p class="mt12">21
<span class="line">|</span>low 18
<span class="line">|</span>north
<span class="line">|</span>2016
</p>
<p class="mt12">22
<span class="line">|</span>low 19
<span class="line">|</span>2018
</p>
有些项目有三个描述,有些有四个。如何使用Xpath检查元素是否存在?我想提取所有描述,如:
# if element four exists get span four
good['year'] = goods.xpath("p[@class='mt12']/text()[4]")
# else get span three
good['year'] = goods.xpath("p[@class='mt12']/text()[3]")
答案 0 :(得分:1)
尝试使用以下代码:
good['year'] = goods.xpath("p[@class='mt12']/text()[4]") or goods.xpath("p[@class='mt12']/text()[3]")
它应该返回text()[4]
如果它存在(非空字符串)或text()[3]
否则
更新
如果"p[@class='mt12']/text()[4]"
表达式返回异常,您可以应用try
/ except
阻止,如下所示:
try:
good['year'] = goods.xpath("p[@class='mt12']/text()[4]")
except IndexError:
good['year'] = goods.xpath("p[@class='mt12']/text()[3]")