如何在BeautifulSoup ISO标记中转义父属性实际命名为<parent>?</parent>

时间:2014-01-15 22:40:30

标签: python xml dom xml-parsing beautifulsoup

好的,这很有趣。这是XML:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

    <parent>
        <groupId>com.parent</groupId>
        <artifactId>parent</artifactId>
        <version>1.0-SNAPSHOT</version>
        <relativePath>../pom.xml</relativePath>
    </parent>

    <build>
        <sourceDirectory>src</sourceDirectory>
    </build>

我想使用简单的BeautifulSoup层次表示法来访问实际命名为<parent>的节点,但parent实际上是此API中的保留属性标签。

with open(pom) as pomHandle:
    soup = BeautifulSoup(pomHandle)

#this returns the proper build node
buildNode = soup.project.build
#this does not return the proper parent node but the XML parent of the project node
#(which is the whole doc) because 'parent' is reserved
parentNode = soup.project.parent

如何覆盖此限制?

1 个答案:

答案 0 :(得分:3)

您可以改为使用find()

soup.project.find('parent')

基本上这是同一件事,因为BeautifulSoupfind类的__getattr__()方法中使用了Tag。{/ p>

希望有所帮助。