我尝试从包含多个标记级别的xml文件中提取字段。在以下示例中,
<compound kind="struct">
<name>my-struct</name>
<filename>struct____dt__args.html</filename>
<member kind="variable">
<type>int32_t</type>
<name>count</name>
<anchorfile>struct____dt__args.html</anchorfile>
<anchor>a0fbe49d8b1189286bd817409658eb631</anchor>
<arglist></arglist>
</member>
<member kind="variable">
<type>int32_t</type>
<name>create_type</name>
<anchorfile>struct____dt__args.html</anchorfile>
<anchor>a4e38c7f138891d020cce3c6d7e6bc31e</anchor>
<arglist></arglist>
</member>
<member kind="variable">
<type>size_t</type>
<name>total_size</name>
<anchorfile>struct____dt__args.html</anchorfile>
<anchor>a41ca25bca63ad1fee790134901d8d1c0</anchor>
<arglist></arglist>
</member>
</compound>
我需要解析这个并在&#39; compound&#39;中提取字段。 tag(有多个具有不同种类结构/函数/类等的复合标签),我只需要kind = struct标签,后跟其子类的类型和名称&#39;成员&#39;标签
struct my-struct:
int32_t count
int32_t create_type
size_t total_size
答案 0 :(得分:0)
以下是解决方案:
from xml.etree import ElementTree
def extract_structs(xml_path):
# data and xml structure validation omitted
# result collected as lists and tuples without string formatting
struct_list = []
root = ElementTree.parse(xml_path).getroot()
for compound in root:
kind = compound.get('kind')
if kind != 'struct':
continue
current_struct = []
struct_list.append(current_struct)
struct_name = compound.find('./name').text
current_struct.append((kind, struct_name))
for member in compound.findall('./member'):
member_type = member.find('./type').text
member_name = member.find('./name').text
current_struct.append((member_type, member_name))
return struct_list
if __name__ == '__main__':
structs = extract_structs('test_file.xml')
print(structs)