Python + XPath:是否有可能在我真正想要的之后选择下一个元素?

时间:2011-09-19 02:49:01

标签: python xpath

假设我有这样的事情:

<span class="filesize">File<a href="http://example.com/image.jpg" 
target="_blank">image.jpg</a>-(1.61 MB, 1000x1542, <span title="what the file is actually 
called.jpg">what the file is actually called.jpg</span>)</span><br><a href="http://example.com
/image.jpg" target="_blank">

我想要从中提取http://example.com/image.jpgwhat the file is actually called.jpg。常数术语是<span class="filesize">File,我可以使用xpath("span[text()='File']")找到它,但这只能让我访问span。有没有办法做result += 1之后转到后面的链接,然后用文件名后的span

1 个答案:

答案 0 :(得分:2)

您可以使用following-siblingpreceding-sibling xpath“轴”进行所需的导航。 你可以获得拘留here

修改

这是一个只使用xpath获取所需结果的示例。然而,根据周围的XML是什么,它可能不适合你:(我还必须完成一些标记为“真正的”XML。你可以通过放置XML来实现它而不用它解析器进入HTML模式...)

import lxml.etree

xml = lxml.etree.XML("""<something><span class="filesize">File<a href="http://example.com/image.jpg" target="_blank">image.jpg</a>-(1.61 MB, 1000x1542, <span title="what the file is actually called.jpg">what the file is actually called.jpg</span>)</span><br/><a href="http://example.com/image.jpg" target="_blank"></a></something>""",)

print xml.xpath("a[preceding-sibling::span/text()='File']/@href")