我有一个xml文件:
<uniprot created="2010-12-20">
<entry dataset="abc">
<references id="1">
<title>first references</title>
<author>
<person name="Mr. A"/>
<person name="Mr. B"/>
<person name="Mr. C"/>
</author>
<scope> scope 1 for id 1 </scope>
<scope> scope 2 for id 1 </scope>
<scope> scope 2 for id 1 </scope>
</references>
<references id="2">
<title>Second references</title>
<author>
<person name="Mr. D"/>
<person name="Mr. E"/>
<person name="Mr. F"/>
</author>
<scope> scope 1 for id 2 </scope>
<scope> scope 2 for id 2 </scope>
<scope> scope 3 for id 2 </scope>
</references>
<references id="3">
<title>third references</title>
<author>
<person name="Mr. G"/>
<person name="Mr. H"/>
<person name="Mr. I"/>
</author>
<scope> scope 1 for id 3 </scope>
<scope> scope 2 for id 3 </scope>
<scope> scope 3 for id 3 </scope>
</references>
<references id="4">
<title>fourth references</title>
<author>
<person name="Mr. J"/>
<person name="Mr. K"/>
<person name="Mr. L"/>
</author>
<scope> scope 1 for id 4 </scope>
<scope> scope 2 for id 4 </scope>
<scope> scope 3 for id 4 </scope>
</references>
</entry>
</uniprot>
我希望以特定格式显示此xml中的所有引用: 输出:
First Reference
Mr A, Mr B, Mr C
Scope 1 for id 1, Scope 2 for id 1, Scope 3 for id 1
Second Reference
Mr D, Mr E, Mr F
Scope 1 for id 2, Scope 2 for id 2, Scope 3 for id 2
Third Reference
Mr G, Mr H, Mr I
Scope 1 for id 3, Scope 2 for id 3, Scope 3 for id 3
Fourth Reference
Mr J, Mr K, Mr L
Scope 1 for id 4, Scope 2 for id 4, Scope 3 for id 4
我已经编写了我的代码,并且能够以正确的格式获取标题的值,但我无法专门为每个条目获取作者信息。
import xml.etree.ElementTree as ET
document = ET.parse("recipe.xml")
root = document.getroot()
title=[]
author=[]
scope=[]
for i in root.getiterator('title'):
title.append(i.text)
for j in root.getiterator('author'):
author.append(j.text)
for k in root.getiterator('scope'):
scope.append(k.text)
for i,j,k in zip(title,author,scope):
print i,j,k
答案 0 :(得分:0)
因为作者和#39;名称存储在name
标记的person
属性中,也让我们使用dict存储每个reference
数据,如下所示:
references = []
for i in root.getiterator('title'):
reference = {
'title': i.text,
'authors': [],
'scopes': [],
}
for j in root.getiterator('author'):
for person in root.getiterator('person'):
reference['authors'].append(person.get('name'))
for k in root.getiterator('scope'):
reference['scopes'].append(k.text)
最后,你会得到一个像这样的词典列表:
[
{
'title': 'Something',
'authors': [
'Author 1',
'Author 2',
],
'scopes': [
'scope 1',
'scope 2',
]
}
]
答案 1 :(得分:0)
使用LXML和xpath:
import lxml
from lxml.etree import fromstring,tostring
# x has the xml
x = fromstring(x)
def print_references(ref_node):
authors = " ".join([t for t in ref_node.xpath('author/person/@name')])
scope = ", ".join([t.text for t in ref_node.xpath('scope')])
ref = next(iter(ref_node.xpath('@id')),None)
print "{} Reference\n{}\n{}\n".format(ref, authors, scope.lstrip())
references = x.xpath('//references')
for ref in references:
print_references(ref)
输出:
1 Reference
Mr. A Mr. B Mr. C
scope 1 for id 1 , scope 2 for id 1 , scope 2 for id 1
2 Reference
Mr. D Mr. E Mr. F
scope 1 for id 2 , scope 2 for id 2 , scope 3 for id 2
3 Reference
Mr. G Mr. H Mr. I
scope 1 for id 3 , scope 2 for id 3 , scope 3 for id 3
4 Reference
Mr. J Mr. K Mr. L
scope 1 for id 4 , scope 2 for id 4 , scope 3 for id 4