解析深层嵌套的树,返回所有父母

时间:2018-11-27 14:20:42

标签: python xml

我有这个XML:

<nodes>
  <node id="1">
    <nodes>
      <node id="7">
        <nodes>
          <node id="9">
            <nodes>
              <node id="5">
                <nodes>
                  <node id="4">
                    <nodes>
                      <node id="3">
                        <nodes />
                        <variables>
                          <variable id="5"  />
                          <variable id="1"  />
                          <variable id="8"  />
                          <variable id="1"  />
                          <variable id="9"  />
                        </variables>
                      </node>
                    </nodes>
                    <variables>
                      <variable id="4"  />
                      <variable id="6"  />
                      <variable id="8"  />
                    </variables>
                  </node>
                </nodes>
              </node>
            </nodes>
          </node>
        </nodes>
      </node>
    </nodes>
  </node>
</nodes>

我想获得分配了变量的节点: 那就是说我想要这个输出:

[node_id: [variable_ids]] ['3': ['5','1','8','9'], '4': ['4','6','8']]

我从以下XML解析开始:

import xml.etree.ElementTree as ET
root = ET.fromstring(xml)

def iterate_node(eq):
    text = ""
    for node in eq:
        if 'id' in node.keys():
            text = text + " { ID: " +  node.attrib['id'] + " TAG: " + node.tag + " }"
        text = text + iterate_node(node)
    return text

for node_root in root.findall('nodes'):
    print(node_root.tag)
    for eq in node_root:
        print(iterate_node(eq))

但是此代码并未将每个节点的所有变量相加。您将如何解析此XML? 谢谢

1 个答案:

答案 0 :(得分:0)

要在每个节点上添加所有变量,可以使用临时列表附加变量ID,然后将它们分配给以节点ID为键的字典。

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

xmlstring = '''<nodes>
...
</nodes>'''

def iterate_node(xmlstr):
    node_dict = {}
    root = ET.fromstring(xmlstr)
    # iterate through all elements
    for i in root.iter():
        if i.tag == 'node':
            #get key (=node id) for the dictionary
            key = i.attrib['id']
            for child in i:
                #iterate through variables
                if 'variables' == child.tag:
                    varslist_temp = []
                    for vars in child:
                        #add element (=var id) to list
                        varslist_temp.append(vars.attrib['id'])
                    #assign the set to the dict to clear double values
                    node_dict[key] = (set(varslist_temp))

                # the following else-statement adds "empty" nodes to your dict
                # eg.{'1': 'None', '7': 'None', ...
                #else:
                    #node_dict[key] = 'None'
    return node_dict

print(iterate_node(xmlstring))
node_dict = iterate_node(xmlstring)
# {'4': {'8', '6', '4'}, '3': {'8', '9', '5', '1'}}

在您的方法中,您似乎希望每个节点接收一个输出。您可以使用以下语句逐个获取字典项。

for item in node_dict.items():
    print(item)
#('4', {'8', '4', '6'})
#('3', {'8', '9', '5', '1'})

如果(仅使用的)else语句被“激活”,则此示例将产生以下输出:

('1', 'None')
('7', 'None')
('9', 'None')
('5', 'None')
('4', {'6', '8', '4'})
('3', {'1', '9', '8', '5'})