我有几个xml(s)如下。我想使用Python中的Beautiful Soup按照下面的预期输出从xml中提取内容(作为数据帧)。请帮助我。
示例XML:
<Author AffiliationIDS="Aff1 Aff2" CorrespondingAffiliationID="Aff1" ORCID="http://orcid.org/0000-0003-4649-327X">
<AuthorName DisplayOrder="Western">
<GivenName>Anouk</GivenName>
<GivenName>van der</GivenName>
<FamilyName>Hoorn</FamilyName>
</AuthorName>
<Contact>
<Phone>+31-50-3612400</Phone>
<Fax>+31-50-3611707</Fax>
<Email>a.van.der.hoorn@umcg.nl</Email>
</Contact>
</Author>
<Author AffiliationIDS="Aff1">
<AuthorName DisplayOrder="Western">
<GivenName>Kamal</GivenName>
<GivenName>M.</GivenName>
<FamilyName>Aden</FamilyName>
</AuthorName>
</Author>
<Author AffiliationIDS="Aff1 Aff2">
<AuthorName DisplayOrder="Western">
<GivenName>Peter</GivenName>
<GivenName>Jan</GivenName>
<FamilyName>van Laar</FamilyName>
</AuthorName>
</Author>
预期输出:
Anouk van der Hoorn AuthorName
Kamal M. Aden AuthorName
Peter Jan var Laar AuthorName
答案 0 :(得分:1)
这里是代码,只是几行:
from bs4 import BeautifulSoup as b
with open("sample.xml", "r") as f: # opening xml file
content = f.read()
soup = b(content, "lxml")
authornames = ([values.find("authorname").text.replace("\n", ' ') for values in soup.findAll("author")])
print authornames
输出:
[u' Anouk van der Hoorn ', u' Kamal M. Aden ', u' Peter Jan van Laar ']