elementtree:获取xml文档中特定标记的内容

时间:2017-01-23 16:12:23

标签: python elementtree

我正在尝试在XML文件中提取特定标记的内容。

示例XML:

<facts>
        <fact>
            <name>crash</name>
            <full_name>Crash</full_name>
            <variables>
                <variable>
                    <name>id</name>
                    <proper_name>Crash Instance</proper_name>
                    <type>INT</type>
                    <interpretation>key</interpretation>
                </variable>
                <variable>
                    <name>accident_key</name>
                    <proper_name>Case Identifier</proper_name>
                    <interpretation>string</interpretation>
                    <type>CHAR(9)</type>
                </variable>
                <variable>
                    <name>accident_year</name>
                    <proper_name>Crash Year</proper_name>
                    <interpretation>dim</interpretation>
                    <type>INT</type>
                </variable>
            </variables>
        </fact>
    <fact>
        <name>vehicle</name>
        <full_name>Vehicle</full_name>
        <variables>
            <variable>
                <name>id</name>
                <proper_name>Vehicle Instance</proper_name>
                <type>INT</type>
            </variable>
            <variable>
                <name>crash_id</name>
                    <proper_name>Crash Instance</proper_name>
                <type>INT</type>
            </variable>
        </variables>
    </fact>
</facts>

我想从节点中提取标记的所有内容,但仅限于崩溃事实。

到目前为止,这是我的代码。

def header(filename, fact):    
    lst = []
    tree = ET.parse(filename) #read in the XML
    for fact in tree.iter(tag = 'fact'):
        factname = fact.find('name').text
        if factname == fact: #choose the fact to pull from
            for var in fact.iter(tag = 'variable'):
                name = var.find('name').text
                lst.append(name)
     return lst #return a list of all the <name> tags from the Crash fact

newlst = header('schema.xml','crash')

我的输出newlst应该是Crash事实中所有标签的列表。但它一直空着。

奇怪的是,如果我对所有内容进行硬编码(并删除函数),它会返回正确的输出:

lst = []
tree = ET.parse('schema.xml')
for fact in tree.iter(tag = 'fact'):
    factname = fact.find('name').text
    if factname == 'crash': 
        for var in fact.iter(tag = 'variable'):
            name = var.find('name').text
            lst.append(name)
 print(lst)


 Output: ['id',
 'accident_key',
 'accident_year']

1 个答案:

答案 0 :(得分:4)

在函数中,您将变量fact用作参数,并将其作为第一个for循环变量。试试这个版本:

def header(filename, target_factname):    
    lst = []
    tree = ET.parse(filename) #read in the XML
    for fact in tree.iter(tag = 'fact'):
        factname = fact.find('name').text
        if factname == target_factname: #choose the fact to pull from
            for var in fact.iter(tag = 'variable'):
                name = var.find('name').text
                lst.append(name)
     return lst #return a list of all the <name> tags from the Crash fact