新列出xml文件的每次迭代

时间:2019-07-25 23:15:36

标签: python xml python-3.x list elementtree

示例XML文件:

<main>
   <data>
      <some>111</some>
      <other>222</other>
      <more>333</more>
   </data>
   <data>
      <some>444</some>
      <other>555</other>
      <more>666</more>
   </data>
   <data>
      <some>777</some>
      <other>888</other>
      <more>999</more>
   </data>
</main>

我想为每个数据子项创建一个列表。例如:

1 = [111, 222, 333]
2 = [444, 555, 666]
3 = [777, 888, 999]

我想创建一个循环遍历整个XML文件的循环,然后创建一个存储下一组数据的新列表(不覆盖先前创建的列表)。

tree = et.parse(xml_file)
root = tree.getroot()

num = 0
for child in root:
    num = []
    num += 1
    for element in child:
        num.append(element.text)

我知道这段代码行不通,但是我希望它能使我对我要尝试的内容有所了解。我不确定如何解决这个问题,并正在寻找想法。

2 个答案:

答案 0 :(得分:1)

您可以使用BeautifulSoup来解析xml并将每个data块的子孩子存储在字典中。 enumerate可用于提供数字父键:

from bs4 import BeautifulSoup as soup
import re
d = soup(open('file.xml').read(), 'xml')
result = {i:[int(j.text) for j in a.find_all(re.compile('some|other|more'))] for i, a in enumerate(d.find_all('data'), 1)} 

输出:

{1: [111, 222, 333], 2: [444, 555, 666], 3: [777, 888, 999]}

如果您不想创建字典,则可以简单地使用拆包:

a, b, c = [[int(i.text) for i in a.find_all(re.compile('some|other|more'))] for a in d.find_all('data')]

输出:

[111, 222, 333]
[444, 555, 666]
[777, 888, 999]

答案 1 :(得分:1)

此处(不使用外部库)

import xml.etree.ElementTree as ET

xml = '''<main>
   <data>
      <some>111</some>
      <other>222</other>
      <more>333</more>
   </data>
   <data>
      <some>444</some>
      <other>555</other>
      <more>666</more>
   </data>
   <data>
      <some>777</some>
      <other>888</other>
      <more>999</more>
   </data>
</main>'''

root = ET.fromstring(xml)
collected_data = []
for d in root.findall('.//data'):
    collected_data.append([d.find(x).text for x in ['some', 'other', 'more']])
print(collected_data)
# if the output needs to be a dict
collected_data = {idx + 1: entry for idx, entry in enumerate(collected_data)}
print(collected_data)

输出

[['111', '222', '333'], ['444', '555', '666'], ['777', '888', '999']]
{1: ['111', '222', '333'], 2: ['444', '555', '666'], 3: ['777', '888', '999']}