Question

有没有办法只在单个节点级别使用getElementsByTagName而不是递归？

E.g。考虑解析pom.xml文件：

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

    <parent>
        <groupId>com.parent</groupId>
        <artifactId>parent</artifactId>
        <version>1.0-SNAPSHOT</version>
        <relativePath>../pom.xml</relativePath>
    </parent>

    <modelVersion>2.0.0</modelVersion>
    <groupId>com.parent.somemodule</groupId>
    <artifactId>some_module</artifactId>
    <packaging>jar</packaging>
    <version>1.0-SNAPSHOT</version>
    <name>Some Module</name>
    ...

如果我想让groupId位于顶层（特别是project->groupId，而不是project->parent->groupId），我会使用：

xmldoc = minidom.parse('pom.xml')
groupId = xmldoc.getElementsByTagName("groupId")[0].childNodes[0].nodeValue

但不幸的是，无论层次结构级别是groupId，它都会在文件中找到project->parent->groupId的第一次物理出现。我实际上只想在特定节点级别进行非递归查找，而不是在其子级内。有没有办法在xml.dom中完成？

更新：我切换到BeautifulSoup但仍然遇到隐式递归遍历的相同问题：Finding a nonrecursive DOM subnode in Python using BeautifulSoup

Answer 1

您可以迭代getElementsByTagName()个结果并获取根级别上的第一个元素：

group_id_element =  next(element for element in xmldoc.getElementsByTagName("groupId")
                         if element.parentNode == xmldoc.documentElement)

print group_id_element.childNodes[0].nodeValue

请注意，使用ElementTree执行相同操作会更容易，更短，更快，这也是标准库的一部分。

希望有所帮助。

Python xml.dom中没有递归（单节点级别）getElementsByTagName

1 个答案: