Question

我需要编写一个动态函数，通过动态构建元素的XPath来查找ATOM xml子树上的元素。

为此，我写了这样的话：

    tree = etree.parse(xmlFileUrl)
    e = etree.XPathEvaluator(tree, namespaces={'def':'http://www.w3.org/2005/Atom'})
    entries = e('//def:entry')
    for entry in entries:
        mypath = tree.getpath(entry) + "/category"
        category = e(mypath)

上面的代码无法找到“category”，因为getpath（）返回没有名称空间的XPath，而XPathEvaluator e（）需要名称空间。

虽然我知道我可以使用路径并在调用XPathEvaluator时提供命名空间，但我想知道是否可以使用所有命名空间使getpath（）返回“完全限定”路径，因为这是在某些情况下很方便。

（这是我之前提出的问题的衍生问题：Python XpathEvaluator without namespace）

Answer 1

您可以使用条目作为基节点来评估XPath表达式，而不是尝试从根构建完整路径：

tree = etree.parse(xmlFileUrl)
nsmap = {'def':'http://www.w3.org/2005/Atom'}
entries_expr = etree.XPath('//def:entry', namespaces=nsmap)
category_expr = etree.XPath('category')
for entry in entries_expr(tree):
    category = category_expr(entry)

如果性能不重要，可以在元素而不是预编译表达式上使用.xpath()方法来简化代码：

tree = etree.parse(xmlFileUrl)
nsmap = {'def':'http://www.w3.org/2005/Atom'}
for entry in tree.xpath('//def:entry', namespaces=nsmap):
    category = entry.xpath('category')

Answer 2

基本上，使用标准Python的xml.etree库，需要一个不同的访问函数。要实现此范围，您可以构建 iter 方法的修改版本，如下所示：

def etree_iter_path(node, tag=None, path='.'):
    if tag == "*":
        tag = None
    if tag is None or node.tag == tag:
        yield node, path
    for child in node:
        _child_path = '%s/%s' % (path, child.tag)
        for child, child_path in etree_iter_path(child, tag, path=_child_path):
            yield child, child_path

然后，您可以使用此函数从根节点迭代树：

from xml.etree import ElementTree

xmldoc = ElementTree.parse(*path to xml file*)
for elem, path in etree_iter_path(xmldoc.getroot()):
    print(elem, path)

Answer 3

来自文档http://lxml.de/xpathxslt.html#the-xpath-class：

ElementTree对象有一个方法getpath(element)，它返回一个结构的绝对XPath表达式来查找该元素：

所以问题的答案是getpath()不会返回完全合格的＆＃34; path，因为否则函数会有一个参数，你只能保证返回的xpath表达式会找到你那个元素。

您可以将getpath和xpath（以及Xpath类）结合起来，以实现您想要的效果。

使用ElementTree getpath（）动态获取Xpath

3 个答案: