使用Elementtree将新元素添加到xml

时间:2014-01-06 05:17:47

标签: python xml

我有以下xml结构。(这是它的一部分)。实际上它有TVEpisode','TVShow','Movie','TVSeries','TVSeason',我需要浏览xml文件并检查解密元素。如果它不存在,需要在上述类型(mvoes,TVseries等)下添加一个描述元素,并插入电影的标题,tvepisode等作为描述。

<TVSeries>
<Provider>xxx</Provider>
<Title>The World's Fastest Indian</Title>
<Description> The World's Fastest Indian </Description>
<SortTitle>World's Fastest Indian, The</SortTitle>
</TvSeries>

<Movies>
<Provider>xxx</Provider>
<Title>The World's Fastest Indian</Title>
<Description> The World's Fastest Indian </Description>
<SortTitle>World's Fastest Indian, The</SortTitle>
</Movies>

<TVShow>  
<Provider>xxx</Provider>
<Title>The World's Fastest Indian</Title>
<SortTitle>World's Fastest Indian, The</SortTitle>
</TvShow>

在tvshow下没有描述元素,所以我需要插入以下内容:

<Description> The World's Fastest Indian </Description>

xml文件的一部分:

<Feed xml:base="http://schemas.yyyy.com/xxxx/2011/06/13/ingestion"  xmlns="http://schemas.yyy.com/xxxx/2011/06/13/ingestion">
<Movie>
<Provider>xxx2</Provider>
<Title>The World's Fastest Indian</Title>
<SortTitle>World's Fastest Indian, The</SortTitle>
</Movie>
<TVSeries>
<Provider>xxx</Provider>
<Title>The World's Fastest Indian</Title>
<Description> The World's Fastest Indian </Description>
<SortTitle>World's Fastest Indian, The</SortTitle>
</TvSeries>

我需要遍历xml文件并需要插入元素“description”,如果描述不存在(还需要在描述中添加一些文本)。

这就是我所做的。这可以给我一些没有描述的标题。但是当我尝试将元素插入结构时,它会给我以下错误:

  File "/usr/lib/python2.4/site-packages/elementtree/ElementTree.py", line 293, in insert
   assert iselement(element)
   AssertionError

代码:

import elementtree.ElementTree as ET
import sys
import re
output_namespace='http://schemas.yyy.com/xxx/2011/06/13/ingestion'

types_to_remove=['TVEpisode','TVShow','Movie','TVSeries','TVSeason']

if ET.VERSION[0:3] == '1.2':
#in ET < 1.3, this is a workaround for supressing prefixes
def fixtag(tag, namespaces):
    import string
    # given a decorated tag (of the form {uri}tag), return prefixed
    # tag and namespace declaration, if any
    if isinstance(tag, ET.QName):
        tag = tag.text
    namespace_uri, tag = string.split(tag[1:], "}", 1)
    prefix = namespaces.get(namespace_uri)
    if namespace_uri not in namespaces:
        prefix = ET._namespace_map.get(namespace_uri)
        if namespace_uri not in ET._namespace_map:
            prefix = "ns%d" % len(namespaces)
        namespaces[namespace_uri] = prefix
        if prefix == "xml":
            xmlns = None
        else:
            if prefix is not None:
                nsprefix = ':' + prefix
            else:
                nsprefix = ''
            xmlns = ("xmlns%s" % nsprefix, namespace_uri)
    else:
        xmlns = None
    if prefix is not None:
        prefix += ":"
    else:
        prefix = ''

    return "%s%s" % (prefix, tag), xmlns

ET.fixtag = fixtag
ET._namespace_map[output_namespace] = None
else:
    #For ET > 1.3, use register_namespace function
      ET.register_namespace('', output_namespace)



def descriptionAdd(root,type):
     for child in root.findall('.//{http://schemas.yyy.com/xxx/2011/06/13/ingestion}%s' % type):
        title=child.find('.//{http://schemas.yyy.com/xxx/2011/06/13/ingestion}Title').text
        try:
                if child.find('.//{http://schemas.yyy.com/xxx/2011/06/13 /ingestion}Description').text=="":
               print("")
        except:
            print ' %s - couldn\'t find description' % (title)
            print(child.tag)
            child.insert(2,"Description")

 ####Do the actual work and writing new changes to the new xml file.

    tree = ET.parse(sys.argv[1])
    root = tree.getroot()
    for type in types_to_remove:

          descriptionAdd(root,type)

    tree.write(sys.argv[2])

1 个答案:

答案 0 :(得分:1)

<强>更新

我想,我现在看到了你想要的东西。以下是我将如何做到这一点。请注意,您需要将其应用于包含电影,电视节目等的父元素。另请注意,案例很重要(请参阅下面的代码中的注释)。

首先,功能:

def insert_description(element):
    '''Inserts the Title as a Description if Desscription not present.'''
    for sub_e in element:
        if sub_e.find('Description') is None:
            title = sub_e.find('Title').text
            new_desc = ET.Element('Description')
            new_desc.text = title
            sub_e.insert(2, new_desc)

现在测试一下这个功能:

>>> xml = '''
<Root>
 <Movie>
  <Provider>xxx2</Provider>
  <Title>The World's Fastest Indian</Title>
  <SortTitle>World's Fastest Indian, The</SortTitle>
 </Movie>
 <TVSeries>
  <Provider>xxx</Provider>
  <Title>The World's Fastest Indian</Title>
  <Description> The World's Fastest Indian </Description>
  <SortTitle>World's Fastest Indian, The</SortTitle>
  </TVSeries> // note that I changed the v to an upper-case V
</Root>'''
>>> root = ET.fromstring(xml)
>>> insert_description(root)
>>> print ET.tostring(root)
<Root>
 <Movie>
  <Provider>xxx2</Provider>
  <Title>The World's Fastest Indian</Title>
  <Description>The World's Fastest Indian</Description>
  <SortTitle>World's Fastest Indian, The</SortTitle>
 </Movie>
 <TVSeries>
  <Provider>xxx</Provider>
  <Title>The World's Fastest Indian</Title>
  <Description> The World's Fastest Indian </Description>
  <SortTitle>World's Fastest Indian, The</SortTitle>
 </TVSeries> // note that I changed the v to an upper-case V
</Root>

我用缩进格式化了后一个输出,以使发生的事情变得更清楚。