Question

我试图获取给定节点（SCHOOL）的元素（GRADE）值的计数（基于下面的示例，结果将是：GR12 = 2，GR10 = 1，GR9 = 4，GR11 = 1）：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns1:SchoolUpload xmlns:ns1="http://abcsite.ca">
<ns1:School>
        <ns1:SchoolID>123456</ns1:SchoolID>
        <ns1:Students>
            <ns1:Student>
                <ns1:ID>1</ns1:ID><ns1:Grade>GR12</ns1:Grade><ns1:Name>A. Green</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>2</ns1:ID><Grade>GR9</ns1:Grade><ns1:Name>B. Green</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>3</ns1:ID><Grade>GR12</ns1:Grade><ns1:Name>A. Blue</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>4</ns1:ID><Grade>GR9</ns1:Grade><ns1:Name>B. Blue</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>5</ns1:ID><Grade>GR11</ns1:Grade><ns1:Name>C. Blue</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>6</ns1:ID><Grade>GR9</ns1:Grade><ns1:Name>A. Redd</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>7</ns1:ID><Grade>GR9</ns1:Grade><ns1:Name>B. Redd</ns1:Name>
            </ns1:Student>
            <ns1:Student>
                <ns1:ID>8</ns1:ID><ns1:Grade>GR10</ns1:Grade><ns1:Name>C. Redd</ns1:Name>
            </ns1:Student>
        </ns1:Students>
    <ns1:School>
</ns1:SchoolUpload>

我的解决方案遍历每个SCHOOL，搜索/创建每个GRADE属性值的列表，然后使用 len（）函数获取每个GRADE列表的元素计数：

school_list = root.findall('.//{http://abcsite.ca}School') #Get list of schools
for school in school_list: 
    gr9 = school.findall("{http://abcsite.ca}Students/Student/*[@{http://abcsite.ca}Grade='GR9']")
    gr10 = school.findall("{http://abcsite.ca}Students/Student/*[@{http://abcsite.ca}Grade='GR10']")
    gr11 = school.findall("{http://abcsite.ca}Students/Student/*[@{http://abcsite.ca}Grade='GR11']")
    gr12 = school.findall("{http://abcsite.ca}Students/Student/*[@{http://abcsite.ca}Grade='GR12']")
    print(len(gr9))
    print(len(gr10))
    print(len(gr11))
    print(len(gr12))

但是， school.findall（）函数调用找不到指定的属性值，因此不返回列表。我只是在学习Python（通过https://docs.python.org/3.6/library/xml.etree.elementtree.html网站）我整天都在尝试不同的想法，我认为这会奏效，但我无法弄清楚。任何建议/帮助将非常感激（同样，如果有一个更优雅的解决方案，我都是耳朵）。

---编辑：代码在下面的评论中修改了建议

import xml.etree.ElementTree as ET

def main():
    ns = { 'ns1' : '{http://ontario.ca}' }
    school_file = 'c://Users/dperry2/Desktop/python/schools.XML'
    tree = ET.parse(school_file)
    root = tree.getroot()
    #//I attempted to use the namespace technique with the school list(below), and although it doesn't error, it didn't return anything; school_list was empty?!?!?
    #school_list = root.findall('.//ns1:School') #, ns) 
    school_list = root.findall('.//{http://ontario.ca}School') 
    for school in school_list: 
        gr9 = school.findall("ns1:Students/ns1:Student/ns1:Grade[.='GR9']", ns)
        print(len(gr9))     
main()

Answer 1

Grade是XML元素，不是属性。在XPath中，@用于引用XML attribute，而您在此处未阅读任何XML属性：

ns = { 'ns1' : 'http://abcsite.ca' }
school_list = root.findall('.//ns1:School', namespaces=ns) #Get list of schools
for school in school_list: 
    gr9 = school.findall("ns1:Students/ns1:Student[ns1:Grade='GR9']/ns1:Grade", namespaces=ns)
    ....
    print len(gr9)
    ....

由于您在代码中多次引用了前缀元素，因此使用字典会更方便，如上所示。使用lxml，您可以使用xml.etree不支持的更惯用的XPath，因为xml.etree仅支持XPath 1.0的有限子集：

gr9 = school.findall("ns1:Students/ns1:Student/ns1:Grade[.='GR9']", namespaces=ns)

请注意，.是对当前上下文节点的引用，在上面的情况下是ns1:Grade元素。

使用Python 3.x计算XML子节点的属性值

1 个答案: