使用Python 3.x计算XML子节点的属性值

时间:2017-03-27 21:29:48

标签: xml python-3.x

我试图获取给定节点(SCHOOL)的元素(GRADE)值的计数(基于下面的示例,结果将是:GR12 = 2,GR10 = 1,GR9 = 4,GR11 = 1):

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns1:SchoolUpload xmlns:ns1="http://abcsite.ca">
<ns1:School>
        <ns1:SchoolID>123456</ns1:SchoolID>
        <ns1:Students>
            <ns1:Student>
                <ns1:ID>1</ns1:ID><ns1:Grade>GR12</ns1:Grade><ns1:Name>A. Green</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>2</ns1:ID><Grade>GR9</ns1:Grade><ns1:Name>B. Green</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>3</ns1:ID><Grade>GR12</ns1:Grade><ns1:Name>A. Blue</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>4</ns1:ID><Grade>GR9</ns1:Grade><ns1:Name>B. Blue</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>5</ns1:ID><Grade>GR11</ns1:Grade><ns1:Name>C. Blue</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>6</ns1:ID><Grade>GR9</ns1:Grade><ns1:Name>A. Redd</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>7</ns1:ID><Grade>GR9</ns1:Grade><ns1:Name>B. Redd</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>8</ns1:ID><ns1:Grade>GR10</ns1:Grade><ns1:Name>C. Redd</ns1:Name>
            </ns1:Student>
        </ns1:Students>
    <ns1:School>
</ns1:SchoolUpload>

我的解决方案遍历每个SCHOOL,搜索/创建每个GRADE属性值的列表,然后使用 len()函数获取每个GRADE列表的元素计数:

school_list = root.findall('.//{http://abcsite.ca}School') #Get list of schools
for school in school_list: 
    gr9 = school.findall("{http://abcsite.ca}Students/Student/*[@{http://abcsite.ca}Grade='GR9']")
    gr10 = school.findall("{http://abcsite.ca}Students/Student/*[@{http://abcsite.ca}Grade='GR10']")
    gr11 = school.findall("{http://abcsite.ca}Students/Student/*[@{http://abcsite.ca}Grade='GR11']")
    gr12 = school.findall("{http://abcsite.ca}Students/Student/*[@{http://abcsite.ca}Grade='GR12']")
    print(len(gr9))
    print(len(gr10))
    print(len(gr11))
    print(len(gr12))

但是, school.findall()函数调用找不到指定的属性值,因此不返回列表。我只是在学习Python(通过https://docs.python.org/3.6/library/xml.etree.elementtree.html网站) 我整天都在尝试不同的想法,我认为这会奏效,但我无法弄清楚。任何建议/帮助将非常感激(同样,如果有一个更优雅的解决方案,我都是耳朵)。

---编辑:代码在下面的评论中修改了建议

import xml.etree.ElementTree as ET

def main():
    ns = { 'ns1' : '{http://ontario.ca}' }
    school_file = 'c://Users/dperry2/Desktop/python/schools.XML'
    tree = ET.parse(school_file)
    root = tree.getroot()
    #//I attempted to use the namespace technique with the school list(below), and although it doesn't error, it didn't return anything; school_list was empty?!?!?
    #school_list = root.findall('.//ns1:School') #, ns) 
    school_list = root.findall('.//{http://ontario.ca}School') 
    for school in school_list: 
        gr9 = school.findall("ns1:Students/ns1:Student/ns1:Grade[.='GR9']", ns)
        print(len(gr9))     
main()

1 个答案:

答案 0 :(得分:1)

Grade是XML元素,不是属性。在XPath中,@用于引用XML attribute,而您在此处未阅读任何XML属性:

ns = { 'ns1' : 'http://abcsite.ca' }
school_list = root.findall('.//ns1:School', namespaces=ns) #Get list of schools
for school in school_list: 
    gr9 = school.findall("ns1:Students/ns1:Student[ns1:Grade='GR9']/ns1:Grade", namespaces=ns)
    ....
    print len(gr9)
    ....

由于您在代码中多次引用了前缀元素,因此使用字典会更方便,如上所示。使用lxml,您可以使用xml.etree不支持的更惯用的XPath,因为xml.etree仅支持XPath 1.0的有限子集:

gr9 = school.findall("ns1:Students/ns1:Student/ns1:Grade[.='GR9']", namespaces=ns)

请注意,.是对当前上下文节点的引用,在上面的情况下是ns1:Grade元素。