Question

假设我有这样的事情：

<span class="filesize">File<a href="http://example.com/image.jpg" 
target="_blank">image.jpg</a>-(1.61 MB, 1000x1542, <span title="what the file is actually 
called.jpg">what the file is actually called.jpg</span>)</span><br><a href="http://example.com
/image.jpg" target="_blank">

我想要从中提取http://example.com/image.jpg和what the file is actually called.jpg。常数术语是File，我可以使用xpath("span[text()='File']")找到它，但这只能让我访问span。有没有办法做result += 1之后转到后面的链接，然后用文件名后的span？

Answer 1

您可以使用following-sibling和preceding-sibling xpath“轴”进行所需的导航。你可以获得拘留here。

修改

这是一个只使用xpath获取所需结果的示例。然而，根据周围的XML是什么，它可能不适合你:(我还必须完成一些标记为“真正的”XML。你可以通过放置XML来实现它而不用它解析器进入HTML模式...）

import lxml.etree xml = lxml.etree.XML("""<something>File<a href="http://example.com/image.jpg" target="_blank">image.jpg</a>-(1.61 MB, 1000x1542, what the file is actually called.jpg) <a href="http://example.com/image.jpg" target="_blank"></a></something>""",) print xml.xpath("a[preceding-sibling::span/text()='File']/@href")

Python + XPath：是否有可能在我真正想要的之后选择下一个元素？

1 个答案: