我想从多个结构相似的xml标签中提取信息。我遍历每个孩子,将其附加到字典中。有没有办法避免每个标签的for循环(比如我的MWE中的sn和count)。
from bs4 import BeautifulSoup as bs
import pandas as pd
xml = """
<info>
<tag>
<sn>9-542</sn>
<count>14</count>
</tag>
<tag>
<sn>3-425</sn>
<count>16</count>
</tag>
</info>
"""
bs_obj = bs(xml, "lxml")
info = bs_obj.find_all('tag')
d = {}
# I want to avoid these multiple for-loops
d['sn'] = [i.sn.text for i in info]
d['count'] = [i.count.text for i in info]
pd.DataFrame(d)
答案 0 :(得分:1)
考虑以下方法
有两个for循环只是为了使这个解决方案是动态的(如果你想要另一个标签,唯一要改变的是needed_tags
列表):
from collections import defaultdict
d = defaultdict(list)
needed_tags = ['sn', 'count']
for i in info:
for tag in needed_tags:
d[tag].append(getattr(i, tag).text)
print(d)
>> defaultdict(<class 'list'>, {'count': ['14', '16'], 'sn': ['9-542', '3-425']})
对于您的确切示例,可以简化为:
from collections import defaultdict
d = defaultdict(list)
for i in info:
d['sn'].append(i.sn.text)
d['count'].append(i.count.text)
print(d)
>> defaultdict(<class 'list'>, {'count': ['14', '16'], 'sn': ['9-542', '3-425']})