我有这个示例XML代码
<pathway>
<relation entry1="62" entry2="64" type="PPrel">
<subtype name="activation" value="-->"/>
</relation>
<relation entry1="54" entry2="55" type="PPrel">
<subtype name="activation" value="-->"/>
<subtype name="phosphorylation" value="+p"/>
</relation>
<relation entry1="55" entry2="82" type="PPrel">
<subtype name="activation" value="-->"/>
<subtype name="phosphorylation" value="+p"/>
</relation>
</pathway>
我正在尝试将子类型排序到列表中,但如果条目有多个子类型,则将它们组合成一个字符串
示例输出: ['激活','激活;磷酸化','激活;磷酸化']
我目前的代码是
tree= ET.parse('file.xml')
root= tree.getroot()
relation = []
for son in root:
for step_son in son:
if len(son.getchildren()) > 1:
relation.append(step_son.get('name'))
if len(son.getchildren()) < 2:
relation.append(step_son.get('name'))
我的关系输出是:
['激活','激活','磷酸化','激活',磷酸化']
任何帮助都会很棒,谢谢!
答案 0 :(得分:2)
使用find和iterating每个匹配元素:
In [35]: from xml.etree import ElementTree
In [36]: xml_string = """
...: <pathway>
...: <relation entry1="62" entry2="64" type="PPrel">
...: <subtype name="activation" value="-->"/>
...: </relation>
...: <relation entry1="54" entry2="55" type="PPrel">
...: <subtype name="activation" value="-->"/>
...: <subtype name="phosphorylation" value="+p"/>
...: </relation>
...: <relation entry1="55" entry2="82" type="PPrel">
...: <subtype name="activation" value="-->"/>
...: <subtype name="phosphorylation" value="+p"/>
...: </relation>
...: </pathway>"""
In [37]: p_element = ElementTree.fromstring(xml_string)
In [38]: result = []
In [39]: for relation in p_element.findall('.//relation'):
...: result.append(';'.join(x.attrib['name'] for x in relation.findall('.//subtype')))
...:
In [40]: result
Out[40]: ['activation', 'activation;phosphorylation', 'activation;phosphorylation']