使用Python从文本文件创建xml树

时间:2010-09-21 10:03:31

标签: python xml elementtree

解析文本文件时,我需要避免在xml树中创建双分支。假设文本文件如下(行的顺序是随机的):

BRANCH1:branch11:消息11
BRANCH1:branch12:message12
BRANCH2:branch21:message21
BRANCH2:branch22:message22

因此生成的xml树应该有一个带有两个分支的根。这两个分支都有两个子分支。我用来解析这个文本文件的Python代码如下:

import string
fh = open ('xmlbasic.txt', 'r')
allLines = fh.readlines()
fh.close()
import xml.etree.ElementTree as ET
root = ET.Element('root')

for line in allLines:
   tempv = line.split(':')
   branch1 = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]

tree = ET.ElementTree(root)
tree.write('xmlbasictree.xml')

此代码的问题是,xml树中的分支是使用文本文件中的每一行创建的。

如果已存在具有此名称的分支,如何避免在xml树中创建另一个分支的任何建议?

2 个答案:

答案 0 :(得分:1)

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

import xml.etree.ElementTree as ET

root = ET.Element('root')

for line in lines:
    head, subhead, tail = line.split(":")

    head_branch = root.find(head)
    if not head_branch:
        head_branch = ET.SubElement(root, head)

    subhead_branch = head_branch.find(subhead)
    if not subhead_branch:
        subhead_branch = ET.SubElement(branch1, subhead)

    subhead_branch.text = tail

tree = ET.ElementTree(root)
ET.dump(tree)

逻辑很简单 - 你已经在你的问题中说明了这一点!您只需要在创建树之前检查树中是否已存在分支。

请注意,这可能效率低下,因为您要搜索每一行的整个树。这是因为ElementTree不是为了唯一性而设计的。


如果你需要速度(你可能没有,特别是对于小树!),更有效的方法是使用defaultdict存储树结构,然后再将其转换为ElementTree

import collections
import xml.etree.ElementTree as ET

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

root_dict = collections.defaultdict( dict )
for line in lines:
    head, subhead, tail = line.split(":")
    root_dict[head][subhead] = tail

root = ET.Element('root')
for head, branch in root_dict.items():
    head_element = ET.SubElement(root, head)
    for subhead, tail in branch.items():
        ET.SubElement(head_element,subhead).text = tail

tree = ET.ElementTree(root)
ET.dump(tree)

答案 1 :(得分:0)

沿着这些方向的东西?你可以保持分支的级别在dict中重复使用。

b1map = {}

for line in allLines:
   tempv = line.split(':')
   branch1 = b1map.get(tempv[0])
   if branch1 is None:
       branch1 = b1map[tempv[0]] = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]