python多级标记中的XML解析

时间:2017-02-09 00:00:20

标签: python xml xml-parsing

我尝试从包含多个标记级别的xml文件中提取字段。在以下示例中,

  <compound kind="struct">
    <name>my-struct</name>
    <filename>struct____dt__args.html</filename>
    <member kind="variable">
      <type>int32_t</type>
      <name>count</name>
      <anchorfile>struct____dt__args.html</anchorfile>
      <anchor>a0fbe49d8b1189286bd817409658eb631</anchor>
      <arglist></arglist>
    </member>
    <member kind="variable">
      <type>int32_t</type>
      <name>create_type</name>
      <anchorfile>struct____dt__args.html</anchorfile>
      <anchor>a4e38c7f138891d020cce3c6d7e6bc31e</anchor>
      <arglist></arglist>
    </member>
    <member kind="variable">
      <type>size_t</type>
      <name>total_size</name>
      <anchorfile>struct____dt__args.html</anchorfile>
      <anchor>a41ca25bca63ad1fee790134901d8d1c0</anchor>
      <arglist></arglist>
    </member>
    </compound>

我需要解析这个并在&#39; compound&#39;中提取字段。 tag(有多个具有不同种类结构/函数/类等的复合标签),我只需要kind = struct标签,后跟其子类的类型和名称&#39;成员&#39;标签

struct my-struct:
int32_t count
int32_t create_type
size_t total_size

1 个答案:

答案 0 :(得分:0)

以下是解决方案:

from xml.etree import ElementTree


def extract_structs(xml_path):
    # data and xml structure validation omitted
    # result collected as lists and tuples without string formatting
    struct_list = []
    root = ElementTree.parse(xml_path).getroot()
    for compound in root:
        kind = compound.get('kind')
        if kind != 'struct':
            continue
        current_struct = []
        struct_list.append(current_struct)
        struct_name = compound.find('./name').text
        current_struct.append((kind, struct_name))
        for member in compound.findall('./member'):
            member_type = member.find('./type').text
            member_name = member.find('./name').text
            current_struct.append((member_type, member_name))
    return struct_list


if __name__ == '__main__':
    structs = extract_structs('test_file.xml')
    print(structs)