如何将文件中的文本插入新的XML标记中

时间:2016-11-16 15:47:27

标签: python xml xml-parsing elementtree

我有以下代码尝试解析XML文件,使其从外部文本文件(如果找到)读取并将其内容插入新引入的标记中,并使用结果操作保存新的XML文件。

代码如下所示:

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET
import os

# define our data file
data_file = 'test2_of_2016-09-19.xml'

tree = ET.ElementTree(file=data_file)
root = tree.getroot()

for element in root:
    if element.find('File_directory') is not None:
        directory = element.find('File_directory').text
    if element.find('Introduction') is not None:
        introduction = element.find('Introduction').text
    if element.find('Directions') is not None:
        directions = element.find('Directions').text

for element in root:
    if element.find('File_directory') is not None:
        if element.find('Introduction') is not None:
            intro_tree = directory+introduction
            with open(intro_tree, 'r') as f:
                intro_text = f.read()
            f.closed
            intro_body = ET.SubElement(element,'Introduction_Body')
            intro_body.text = intro_text
        if element.find('Directions') is not None:
            directions_tree = directory+directions
            with open(directions_tree, 'r') as f:
                directions_text = f.read()
            f.closed
            directions_body = ET.SubElement(element,'Directions_Body')
            directions_body.text = directions_text

tree.write('new_' + data_file)

问题在于,似乎最后找到的file_directory,简介和路线的实例被保存并分散到多个条目,这是不可取的,因为每个条目都有自己的单独记录。

源XML文件如下所示:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Row>
        <Entry_No>1</Entry_No>
        <Waterfall_Name>Bridalveil Fall</Waterfall_Name>
        <File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory>
        <Introduction>introduction-bridalveil-fall.html</Introduction>
        <Directions>directions-bridalveil-fall.html</Directions>
    </Row>
    <Row>
        <Entry_No>52</Entry_No>
        <Waterfall_Name>Switzer Falls</Waterfall_Name>
        <File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory>
        <Introduction>introduction-switzer-falls.html</Introduction>
        <Directions>directions-switzer-falls.html</Directions>
    </Row>
</Root>

所需的输出XML应如下所示:

<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Row>
        <Entry_No>1</Entry_No>
        <Waterfall_Name>Bridalveil Fall</Waterfall_Name>
        <File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory>
        <Introduction>introduction-bridalveil-fall.html</Introduction>
        <Directions>directions-bridalveil-fall.html</Directions>
        <Introduction_Body>Text from ./waterfall_writeups/1_Bridalveil_Fall/introduction-bridalveil-fall.html</Introduction_Body>
        <Directions_Body>Text from ./waterfall_writeups/1_Bridalveil_Fall/directions-bridalveil-fall.html</Directions_Body>
    </Row>
    <Row>
        <Entry_No>52</Entry_No>
        <Waterfall_Name>Switzer Falls</Waterfall_Name>
        <File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory>
        <Introduction>introduction-switzer-falls.html</Introduction>
        <Directions>directions-switzer-falls.html</Directions>
        <Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body>
        <Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body>
    </Row>
</Root>

但我最终得到的是:

<Root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Row>
        <Entry_No>1</Entry_No>
        <Waterfall_Name>Bridalveil Fall</Waterfall_Name>
        <File_directory>./waterfall_writeups/1_Bridalveil_Fall/</File_directory>
        <Introduction>introduction-bridalveil-fall.html</Introduction>
        <Directions>directions-bridalveil-fall.html</Directions>
        <Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body>
        <Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body>
    </Row>
    <Row>
        <Entry_No>52</Entry_No>
        <Waterfall_Name>Switzer Falls</Waterfall_Name>
        <File_directory>./waterfall_writeups/52_Switzer_Falls/</File_directory>
        <Introduction>introduction-switzer-falls.html</Introduction>
        <Directions>directions-switzer-falls.html</Directions>
        <Introduction_Body>Text from ./waterfall_writeups/52_Switzer_Falls/introduction-switzer-falls.html</Introduction_Body>
        <Directions_Body>Text from ./waterfall_writeups/52_Switzer_Falls/directions-switzer-falls.html</Directions_Body>
    </Row>
</Root>

顺便说一下,有没有办法引入身体标签&#39;没有它的内容都打印在一行上(为了便于阅读)?

1 个答案:

答案 0 :(得分:0)

第一个for循环遍历文档的Row元素,分别为directoryintroductiondirections变量分配新值每次迭代时,最后都会显示最后一个Row元素的值。

我要做的是创建字典以将标记名称映射到文本内容,然后使用该映射动态添加新的子元素。示例(不读取引用的文件):

for row in root:
    elements = {}
    for node in row:
        elements[node.tag] = node.text

    directory = elements['File_directory']

    intro_tree = directory + elements['Introduction']
    intro_body = ET.SubElement(row, 'Introduction_Body')
    intro_body.text = 'Text from %s' % intro_tree

    directions_tree = directory + elements['Directions']
    directions_body = ET.SubElement(row, 'Directions_Body')
    directions_body.text = 'Text from %s' % directions_tree