Python操作并保存XML,更改一个属性

时间:2015-10-22 19:20:37

标签: python xml

我有这个xml:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <SOAP-ENV:Body>
        <m:request xmlns:m="http://www.datapower.com/schemas/management" domain="XXXXX">
            <m:do-action>
                <FlushDocumentCache>
                    <XMLManager class="XMLManager">default</XMLManager>
                </FlushDocumentCache>
                <FlushStylesheetCache>
                    <XMLManager class="XMLManager">default</XMLManager>
                </FlushStylesheetCache>
            </m:do-action>
        </m:request>
    </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

我想只更改域属性的值XXXXX。

我做了类似的事情:

import xml.etree.ElementTree as etree
tree = etree.parse('input.xml')
# HOW TO FIND THE VALUE XXXXX AND CHANGE IT WITH A NEW VALUE ???
tree.write('output.xml')

感谢。

1 个答案:

答案 0 :(得分:0)

几句话:

  • 您将看到解析xml字符串(从文件中)然后将其写入另一个文件,不会产生相同的结果,因为解析器会改变它。您可以通过简单地运行您发布的代码来测试它(显然是第3行):

    import xml.etree.ElementTree as etree
    tree = etree.parse('input.xml')
    tree.write('output.xml')
    
  • 所有 SOAP-ENV: *节点已转换为 ns0 *, m *节点已转换为 NS1 *。为此,我必须将它们从xml文件复制到代码(soap_env_ns_namem_ns_name变量)中,如下所述:Saving XML using ETree in Python. It's not retaining namespaces, and adding ns0, ns1 and removing xmlns tags

  • SOAP-ENC ,默认值( xsi xsd )名称空间已被删除,因为它们未被引用xml。此外, m 已从请求节点移至 Envelope (root)节点;我不确定它是否是标准的一部分,但在大多数XML上,我看到命名空间在根节点中声明。无论如何,这里没有什么可以做的,Python的解析器不是很聪明。

  • 底线是你不会得到完全相同的输出(除非你想编写你自己的解析器,如下所述:Python: Update XML-file using ElementTree while conserving layout as much as possible)。

所以,就是这样,代码对XML结构非常紧张(丑陋而不是最丑),如果结构发生变化,代码也需要更新(这里我不是在谈论命名空间的变通方法) ):

@ EDIT1:添加了for循环来注册命名空间,之前的版本就像我在第二个子弹中描述的那样。但是在运行时,它确实用 Y 替换 X

@ EDIT2:注释掉了domain属性值测试,所以现在无论如何都会改变这个值。

import xml.etree.ElementTree as ET

env_node_name = "Envelope"
body_node_name = "Body"
request_node_name = "request"
domain_attr_name = "domain"
domain_attr_val = "XXXXX"
domain_attr_new_val = "YYYYY"

#Gainarie: those are the namespaces from the xml file
soap_env_ns_name = "SOAP-ENV"
m_ns_name = "m"
#soap_enc_ns_name = "SOAP-ENC"
#xsi_ns_name = "xsi"
#xsd_ns_name = "xsd"

namespaces_dict = {
    soap_env_ns_name: "http://schemas.xmlsoap.org/soap/envelope/",
    m_ns_name: "http://www.datapower.com/schemas/management",

    # Those are simply ignored by the parser as they're not referenced in our xml.
    #"SOAP-ENC": "http://schemas.xmlsoap.org/soap/encoding/",
    #"xsi": "http://www.w3.org/2001/XMLSchema-instance",
    #"xsd": "http://www.w3.org/2001/XMLSchema",
}


def tag(ns, name):
    return "{" + ns + "}" + name


for key in namespaces_dict.keys():
    ET.register_namespace(key, namespaces_dict[key])

tree = ET.parse("input.xml")
root = tree.getroot()
env_gen = root.iter(tag(namespaces_dict[soap_env_ns_name], env_node_name))
try:
    for env in env_gen:
        body_gen = env.iter(tag(namespaces_dict[soap_env_ns_name], body_node_name))
        try:
            for body in body_gen:
                request_gen = body.iter(tag(namespaces_dict[m_ns_name], request_node_name))
                try:
                    for request in request_gen:
                        if domain_attr_name in request.keys():
                            # Now, I didn't fully understand the question:
                            # you want to change the value of the 'domain' attribute (in your xml example: "XXXXX") to - let's say - "YYYYY"  (as my code does) on one of the 2 below cases:
                            # 1: change it only if current value is "XXXXX"
                            # 2: change it regardless of the current value
                            # if it's 1, then that's OK, but if it's 2, you'll have to comment the very below 'if domain_attr_val ...' line (prepend it by a # - just like the current one)
                            #if domain_attr_val == request.get(domain_attr_name):
                            request.set(domain_attr_name, domain_attr_new_val)
                except StopIteration:
                    print "Done iterating on '%s' node" % request_node_name
        except StopIteration:
            print "Done iterating on '%s' node" % body_node_name
except StopIteration:
    print "Done iterating on '%s' node" % env_node_name

tree.write("output.xml")