Question

解析文本文件时，我需要避免在xml树中创建双分支。假设文本文件如下（行的顺序是随机的）：

BRANCH1：branch11：消息11
BRANCH1：branch12：message12
BRANCH2：branch21：message21
BRANCH2：branch22：message22

因此生成的xml树应该有一个带有两个分支的根。这两个分支都有两个子分支。我用来解析这个文本文件的Python代码如下：

import string
fh = open ('xmlbasic.txt', 'r')
allLines = fh.readlines()
fh.close()
import xml.etree.ElementTree as ET
root = ET.Element('root')

for line in allLines:
   tempv = line.split(':')
   branch1 = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]

tree = ET.ElementTree(root)
tree.write('xmlbasictree.xml')

此代码的问题是，xml树中的分支是使用文本文件中的每一行创建的。

如果已存在具有此名称的分支，如何避免在xml树中创建另一个分支的任何建议？

Answer 1

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

import xml.etree.ElementTree as ET

root = ET.Element('root')

for line in lines:
    head, subhead, tail = line.split(":")

    head_branch = root.find(head)
    if not head_branch:
        head_branch = ET.SubElement(root, head)

    subhead_branch = head_branch.find(subhead)
    if not subhead_branch:
        subhead_branch = ET.SubElement(branch1, subhead)

    subhead_branch.text = tail

tree = ET.ElementTree(root)
ET.dump(tree)

逻辑很简单 - 你已经在你的问题中说明了这一点！您只需要在创建树之前检查树中是否已存在分支。

请注意，这可能效率低下，因为您要搜索每一行的整个树。这是因为ElementTree不是为了唯一性而设计的。

如果你需要速度（你可能没有，特别是对于小树！），更有效的方法是使用defaultdict存储树结构，然后再将其转换为ElementTree。

import collections
import xml.etree.ElementTree as ET

with open("xmlbasic.txt") as lines_file:
    lines = lines_file.read()

root_dict = collections.defaultdict( dict )
for line in lines:
    head, subhead, tail = line.split(":")
    root_dict[head][subhead] = tail

root = ET.Element('root')
for head, branch in root_dict.items():
    head_element = ET.SubElement(root, head)
    for subhead, tail in branch.items():
        ET.SubElement(head_element,subhead).text = tail

tree = ET.ElementTree(root)
ET.dump(tree)

Answer 2

沿着这些方向的东西？你可以保持分支的级别在dict中重复使用。

b1map = {}

for line in allLines:
   tempv = line.split(':')
   branch1 = b1map.get(tempv[0])
   if branch1 is None:
       branch1 = b1map[tempv[0]] = ET.SubElement(root, tempv[0])
   branch2 = ET.SubElement(branch1, tempv[1])
   branch2.text = tempv[2]

使用Python从文本文件创建xml树

2 个答案: