使用元素树从XML文档获取子属性

时间:2016-09-13 17:34:07

标签: python maven elementtree

我有一个xml pom文件,如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
    <groupId>com.amirsys</groupId>
    <artifactId>components-parent</artifactId>
    <version>RELEASE</version>
</parent>
<artifactId>statdxws</artifactId>
<version>6.5.0-16</version>
<packaging>war</packaging>
<dependencies>
    <dependency>
        <groupId>org.postgresql</groupId>
        <artifactId>postgresql</artifactId>
        <version>9.4-1200-jdbc41</version>
        <scope>provided</scope>
        <exclusions>
            <exclusion>
                <groupId>org.slf4j</groupId>
                <artifactId>slf4j-simple</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>com.amirsys</groupId>
        <artifactId>referencedb</artifactId>
        <version>5.0.0-1</version>
        <exclusions>
            <exclusion>
                <groupId>com.amirsys</groupId>
                <artifactId>jig</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
</dependencies>

我正在尝试使用元素树来提取groupIds,artifactIds和版本来创建依赖项对象,但它不会找到依赖项标记。到目前为止,这是我的代码:

tree = ElementTree.parse('pomFile.xml')
root = tree.getroot()
namespace = '{http://maven.apache.org/POM/4.0.0}'
for dependency in root.iter(namespace+'dependency'):
    groupId = dependency.get('groupId')
    artifactId = dependency.get('artifactId')
    version = dependency.get('version')
    print groupId, artifactId, version

这什么都不输出,我无法弄清楚为什么代码没有遍历依赖标记。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

你的XML有一个小错误。应该有一个结束标记</project>,你可能在这个问题上错过了。

以下适用于我:

from xml.etree import ElementTree
tree = ElementTree.parse('pomFile.xml')
root = tree.getroot()
namespace = '{http://maven.apache.org/POM/4.0.0}'
for dependency in root.iter(namespace+'dependency'):
    groupId = dependency.find(namespace+'groupId').text
    artifactId = dependency.find(namespace+'artifactId').text
    version = dependency.find(namespace+'version').text
    print groupId, artifactId, version

$ python -i a.py
org.postgresql postgresql 9.4-1200-jdbc41
com.amirsys referencedb 5.0.0-1

您对.get()的使用是错误的。了解.get()的工作原理。让我们说你的xml是:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

你编写如下的python代码:

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
for country in root.findall('country'):
   rank = country.find('rank').text
   name = country.get('name')
   print rank, name

这将打印:

Liechtenstein 1
Singapore 4
Panama 68

如您所见,.get()为您提供了属性的值。 docs对此很清楚。