解析文本文件时,我需要避免在xml树中创建双分支。假设文本文件如下(行的顺序是随机的):
BRANCH1:branch11:消息11
BRANCH1:branch12:message12
BRANCH2:branch21:message21
BRANCH2:branch22:message22
因此生成的xml树应该有一个带有两个分支的根。这两个分支都有两个子分支。我用来解析这个文本文件的Python代码如下:
import string
fh = open ('xmlbasic.txt', 'r')
allLines = fh.readlines()
fh.close()
import xml.etree.ElementTree as ET
root = ET.Element('root')
for line in allLines:
tempv = line.split(':')
branch1 = ET.SubElement(root, tempv[0])
branch2 = ET.SubElement(branch1, tempv[1])
branch2.text = tempv[2]
tree = ET.ElementTree(root)
tree.write('xmlbasictree.xml')
此代码的问题是,xml树中的分支是使用文本文件中的每一行创建的。
如果已存在具有此名称的分支,如何避免在xml树中创建另一个分支的任何建议?
答案 0 :(得分:1)
with open("xmlbasic.txt") as lines_file:
lines = lines_file.read()
import xml.etree.ElementTree as ET
root = ET.Element('root')
for line in lines:
head, subhead, tail = line.split(":")
head_branch = root.find(head)
if not head_branch:
head_branch = ET.SubElement(root, head)
subhead_branch = head_branch.find(subhead)
if not subhead_branch:
subhead_branch = ET.SubElement(branch1, subhead)
subhead_branch.text = tail
tree = ET.ElementTree(root)
ET.dump(tree)
逻辑很简单 - 你已经在你的问题中说明了这一点!您只需要在创建树之前检查树中是否已存在分支。
请注意,这可能效率低下,因为您要搜索每一行的整个树。这是因为ElementTree
不是为了唯一性而设计的。
如果你需要速度(你可能没有,特别是对于小树!),更有效的方法是使用defaultdict
存储树结构,然后再将其转换为ElementTree
。
import collections
import xml.etree.ElementTree as ET
with open("xmlbasic.txt") as lines_file:
lines = lines_file.read()
root_dict = collections.defaultdict( dict )
for line in lines:
head, subhead, tail = line.split(":")
root_dict[head][subhead] = tail
root = ET.Element('root')
for head, branch in root_dict.items():
head_element = ET.SubElement(root, head)
for subhead, tail in branch.items():
ET.SubElement(head_element,subhead).text = tail
tree = ET.ElementTree(root)
ET.dump(tree)
答案 1 :(得分:0)
沿着这些方向的东西?你可以保持分支的级别在dict中重复使用。
b1map = {}
for line in allLines:
tempv = line.split(':')
branch1 = b1map.get(tempv[0])
if branch1 is None:
branch1 = b1map[tempv[0]] = ET.SubElement(root, tempv[0])
branch2 = ET.SubElement(branch1, tempv[1])
branch2.text = tempv[2]