示例XML文件:
<main>
<data>
<some>111</some>
<other>222</other>
<more>333</more>
</data>
<data>
<some>444</some>
<other>555</other>
<more>666</more>
</data>
<data>
<some>777</some>
<other>888</other>
<more>999</more>
</data>
</main>
我想为每个数据子项创建一个列表。例如:
1 = [111, 222, 333]
2 = [444, 555, 666]
3 = [777, 888, 999]
我想创建一个循环遍历整个XML文件的循环,然后创建一个存储下一组数据的新列表(不覆盖先前创建的列表)。
tree = et.parse(xml_file)
root = tree.getroot()
num = 0
for child in root:
num = []
num += 1
for element in child:
num.append(element.text)
我知道这段代码行不通,但是我希望它能使我对我要尝试的内容有所了解。我不确定如何解决这个问题,并正在寻找想法。
答案 0 :(得分:1)
您可以使用BeautifulSoup
来解析xml
并将每个data
块的子孩子存储在字典中。 enumerate
可用于提供数字父键:
from bs4 import BeautifulSoup as soup
import re
d = soup(open('file.xml').read(), 'xml')
result = {i:[int(j.text) for j in a.find_all(re.compile('some|other|more'))] for i, a in enumerate(d.find_all('data'), 1)}
输出:
{1: [111, 222, 333], 2: [444, 555, 666], 3: [777, 888, 999]}
如果您不想创建字典,则可以简单地使用拆包:
a, b, c = [[int(i.text) for i in a.find_all(re.compile('some|other|more'))] for a in d.find_all('data')]
输出:
[111, 222, 333]
[444, 555, 666]
[777, 888, 999]
答案 1 :(得分:1)
此处(不使用外部库)
import xml.etree.ElementTree as ET
xml = '''<main>
<data>
<some>111</some>
<other>222</other>
<more>333</more>
</data>
<data>
<some>444</some>
<other>555</other>
<more>666</more>
</data>
<data>
<some>777</some>
<other>888</other>
<more>999</more>
</data>
</main>'''
root = ET.fromstring(xml)
collected_data = []
for d in root.findall('.//data'):
collected_data.append([d.find(x).text for x in ['some', 'other', 'more']])
print(collected_data)
# if the output needs to be a dict
collected_data = {idx + 1: entry for idx, entry in enumerate(collected_data)}
print(collected_data)
输出
[['111', '222', '333'], ['444', '555', '666'], ['777', '888', '999']]
{1: ['111', '222', '333'], 2: ['444', '555', '666'], 3: ['777', '888', '999']}