已处理的XML文件的内容如下:
<dblp>
<incollection>
<author>Philippe Balbiani</author>
<author>Valentin Goranko</author>
<author>Ruaan Kellerman</author>
<author>Dimiter Vakarelov</author>
<booktitle>Handbook of Spatial Logics</booktitle>
</incollection>
<incollection>
<author>Jochen Renz</author>
<author>Bernhard Nebel</author>
<booktitle>Handbook of AI</booktitle>
</incollection>
...
</dblp>
如上所示,格式内容提取“作者”标签内容和“书名”标签内容,它们都位于“收集”标签中,遍历每个“收集”标签并具有多个“作者”标签内容和“ booktitle”标签内容形式对应的元组。
我的代码:
soup = BeautifulSoup(str(getfile()), 'lxml')
res = soup.find_all('incollection')
author = []
booktitle =[]
for each in res:
for child in each.children:
if child.name == 'author':
author.append(child.text)
elif child.name == 'booktitle':
booktitle.append(child.text)
elem_dic = tuple(zip(author, booktitle))
我得到的结果是:
('Philippe Balbiani', 'Handbook of Spatial Logics')
('Valentin Goranko', 'Handbook of Spatial Logics')
('Ruaan Kellerman', 'Handbook of Spatial Logics')
如何修改它以获得所需的结果?
('Philippe Balbiani', 'Handbook of Spatial Logics')
('Valentin Goranko', 'Handbook of Spatial Logics')
('Ruaan Kellerman', 'Handbook of Spatial Logics')
('Dimiter Vakarelov', 'Handbook of Spatial Logics')
('Jochen Renz', 'Handbook of AI')
('Bernhard Nebel', 'Handbook of AI')
或者您可以在每个“收藏”标签中将“书名”标签添加到与“作者”标签相同的编号。