如何使用python删除xml中的特定标记

时间:2015-09-02 13:16:48

标签: python xml

我必须删除apache-tomcat web.xml文件中的一些特定标记

的web.xml

    <?xml version="1.0" encoding="ISO-8859-1"?>



<web-app xmlns="http://java.sun.com/xml/ns/javaee"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
                      http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd"
  version="3.0">

  <!-- ======================== Introduction ============================== -->
  <!-- This document defines default values for *all* web applications      -->
  <!-- loaded into this instance of Tomcat.  As each application is         -->
  <!-- deployed, this file is processed, followed by the                    -->
  <!-- "/WEB-INF/web.xml" deployment descriptor from your own               -->
  <!-- applications.                                                        -->
  <!--                                                                      -->
  <!-- WARNING:  Do not configure application-specific resources here!      -->
  <!-- They should go in the "/WEB-INF/web.xml" file in your application.   -->

     <servlet>
        <servlet-name>default</servlet-name>
        <servlet-class>org.apache.catalina.servlets.DefaultServlet</servlet-class>
        <init-param>
            <param-name>debug</param-name>
            <param-value>0</param-value>
        </init-param>
        <init-param>
            <param-name>listings</param-name>
            <param-value>false</param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>
   <servlet>
        <servlet-name>jsp</servlet-name>
        <servlet-class>org.apache.jasper.servlet.JspServlet</servlet-class>
        <init-param>
            <param-name>fork</param-name>
            <param-value>false</param-value>
        </init-param>
        <init-param>
            <param-name>xpoweredBy</param-name>
            <param-value>false</param-value>
        </init-param>
        <load-on-startup>3</load-on-startup>
    </servlet>

    <servlet>
        <servlet-name>cgi</servlet-name>
        <servlet-class>org.apache.catalina.servlets.CGIServlet</servlet-class>
        <init-param>
          <param-name>debug</param-name>
          <param-value>0</param-value>
        </init-param>
        <init-param>
          <param-name>cgiPathPrefix</param-name>
          <param-value>WEB-INF/cgi</param-value>
        </init-param>
         <load-on-startup>5</load-on-startup>
    </servlet>
</<web-app>

如果servlet-name == cgi我需要删除entier servlet标记。     我的代码如下:

    from xml.etree.ElementTree import ElementTree
    tree = ElementTree()
    tree.parse('web.xml')
    servlets = tree.findall('servlet')
    print "servlets : ",servlets
    for servlet in servlets:
      servlet_names = foo.findall('servlet-name')
      for servlet_name  in servlet_names:
            if servlet_name == "cgi" :
                    print "servlet_name :", servlet_name
                    servlet.remove(servlet-name)

我将o / p作为servlet:[]         而不是所有的servlet,也无法进入for循环。         任何人都可以帮助我吗?。

我没有得到任何例外

#!/usr/bin/python
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
ns = '{http://java.sun.com/xml/ns/javaee}'
servlets = root.findall(ns + 'servlet')
print "servlets : ",servlets
for servlet in servlets:
  servlet_names = servlet.findall(ns + 'servlet-name')
  for servlet_name  in servlet_names:
        if servlet_name.text == "cgi" :
                print "servlet_name :", servlet_name.text
                print "removed the cgi serverlet", root.remove(servlet)

=====输出=============== servlets:[http://java.sun.com/xml/ns/javaee}servlet at 7f84e09b35a8&gt ;, http://java.sun.com/xml/ns/javaee}servlet at 7f84e09b3878&gt ;,http:// java .sun.com / xml / ns / javaee} servlet at 7f84e09b3bd8&gt;] servlet_name:cgi 删除了cgi serverlet无

====我使用了pdb tracer来找出它的shwoing的元素(servlet)值为\ n ..

> /apps/manu/python/manunamespace.py(10)<module>()
-> servlet_name=servlet.find('{http://java.sun.com/xml/ns/javaee}servlet-name')
(Pdb) servlet_name
<Element {http://java.sun.com/xml/ns/javaee}servlet-name at 882878>
(Pdb) servlet_name.text
'jsp'
(Pdb) n
> /apps/manu/python/manunamespace.py(11)<module>()
-> print "servlet_name:", servlet_name.text
(Pdb) servlet_name.text
'cgi'
(Pdb) servlet.text
'\n        '
(Pdb) n
servlet_name: cgi
> /apps/manu/python/manunamespace.py(12)<module>()
-> if servlet_name.text == "cgi":
(Pdb) n
> /apps/manu/python/manunamespace.py(13)<module>()
-> print "remove the element"
(Pdb) n
remove the element
> /apps/manu/python/manunamespace.py(14)<module>()
-> print "remove : ",root.remove(servlet)
(Pdb) servlet
<Element {http://java.sun.com/xml/ns/javaee}servlet at 882d88>
(Pdb) servlet.text
'\n 

   '

2 个答案:

答案 0 :(得分:1)

这是失败的:

servlets = tree.findall('servlet')

因为文档中没有servlet个元素。根元素指定:

xmlns="http://java.sun.com/xml/ns/javaee"

这意味着除非另有说明,否则所有元素都在此XML命名空间中。所以你想要:

>>> tree.findall('{http://java.sun.com/xml/ns/javaee}servlet')
[<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec681b8>,
<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec68200>, 
<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec682d8>]
>>> 

答案 1 :(得分:1)

您找不到要搜索的标记,因为它们位于默认命名空间(http://java.sun.com/xml/ns/javaee)。

此外,如果要测试元素内容,则需要使用其text属性,而不是与元素本身进行比较。如果匹配,则需要从根目录中删除servlet - 代码,而不是servlet-name中的servlet代码。

试试这个:

from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
nsmap = {'jee': 'http://java.sun.com/xml/ns/javaee'}
servlets = root.findall('jee:servlet', nsmap)
print "servlets : ",servlets
for servlet in servlets:
  servlet_names = servlet.findall('jee:servlet-name', nsmap)
  for servlet_name  in servlet_names:
        if servlet_name.text == "cgi" :
                print "servlet_name :", servlet_name.text
                root.remove(servlet)

或者更有效地使用supported xpath syntax

from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
nsmap = {'jee': 'http://java.sun.com/xml/ns/javaee'}
for servlet in root.findall("./jee:servlet[jee:servlet-name='cgi']", nsmap):
    root.remove(servlet)

编辑:对于较旧的python版本(使用python2.5测试):

from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
ns = '{http://java.sun.com/xml/ns/javaee}'
servlets = root.findall(ns + 'servlet')
print "servlets : ",servlets
for servlet in servlets:
  servlet_names = servlet.findall(ns + 'servlet-name')
  for servlet_name  in servlet_names:
        if servlet_name.text == "cgi" :
                print "servlet_name :", servlet_name.text
                root.remove(servlet)