Python xml.dom.minidom - 请不要逃避我的字符串

时间:2016-06-24 14:30:55

标签: xml python-3.x minidom

我使用minidom模块从我的数据创建XML文档。

目前我正在努力寻找一些pythonic方法来防止minidom逃离我放入的弦乐..

所有邪恶的原因是_write_data方法(在模块的第302行):

def _write_data(writer, data):
    "Writes datachars to writer."
    if data:
        data = data.replace("&", "&amp;").replace("<", "&lt;"). \
                    replace("\"", "&quot;").replace(">", "&gt;")
        writer.write(data)

我想要的只是没有那些data的{​​{1}}。

我找到了一些方法来通过monkeypathing两个函数来防止这种情况:

  • replace父节点
  • 并在该补丁内:
    • writexml

我准备了一些例子:

_write_data

它将产生此输出:

from xml.dom import minidom

SNOWMAN = '&#x2603;&#xfe0e;'

imp = minidom.getDOMImplementation()
dom = imp.createDocument(None, 'root', None)
root = dom.documentElement

evil = dom.createElement('evil')
root.appendChild(evil)
# this does unwanted double escaping:
evil.appendChild(dom.createTextNode(SNOWMAN))

# now for something completely different ...
# this is some way to fix this:
good = dom.createElement('good')
root.appendChild(good)

# - store original ``writexml`` and ``_write_data``
original_writexml = good.writexml
original_write_data = minidom._write_data


def fake_writexml(writer, indent, addindent, newl):
    def fake_writedata(writer, data):
        if data:
            writer.write(data)

    # - overwrite ``_write_data``
    minidom._write_data = fake_writedata

    # - call original ``writexml``
    # -> which itself calls the now patched ``_write_data``
    original_writexml(writer, indent, addindent, newl)

    # - reset ``_write_data`` again
    minidom._write_data = original_write_data

# - overwrite ``writexml``
good.writexml = fake_writexml

# - do stuff
good.appendChild(dom.createTextNode(SNOWMAN))

# -> yay, it works!
print(dom.toprettyxml(indent=' '))

# - reset ``writexml`` again
good.writexml = original_writexml
# -> returns trash again..
print(dom.toprettyxml(indent=' '))

我个人认为这不是好的代码,因为它与<?xml version="1.0" ?> <root> <evil>&amp;#x2603;&amp;#xfe0e;</evil> <good>&#x2603;&#xfe0e;</good> </root> <?xml version="1.0" ?> <root> <evil>&amp;#x2603;&amp;#xfe0e;</evil> <good>&amp;#x2603;&amp;#xfe0e;</good> </root> 的内部混淆,你必须小心不要犯任何错误。

请告诉我你能想到的最棘手的解决方案 - 所以我终于可以享受Snowmans了;-)

☃︎

1 个答案:

答案 0 :(得分:1)

在这里进一步思考我的问题,我有一个想法:

是否可以定义新类型的节点?

确实 - 它是!

from xml.dom import minidom

SNOWMAN = '&#x2603;&#xfe0e;'

imp = minidom.getDOMImplementation()
dom = imp.createDocument(None, 'root', None)

所以,我在那里定义了自己的Node:

class RawText(minidom.Text):
    def writexml(self, writer, indent='', addindent='', newl=''):
        '''
        patching minidom.Text.writexml:1087
        the original calls minidom._write_data:302
        below is a combined version of both, but without the '&' replacements and so on..
        '''
        if self.data:
            writer.write('{}{}{}'.format(indent, self.data, newl))

之后我为原始minidom.Document编写了一些辅助函数来创建我自己类型的新节点。

def createRawTextNode(data):
    '''
    helper function for minidom.Document:1519 to create Nodes of RawText
    see minidom.Document.createTextNode:1656
    '''
    if not isinstance(data, str):
        raise TypeError('node contents must be a string')
    r = RawText()
    r.data = data
    r.ownerDocument = dom  # there is no self
    return r

# ... and attach the helper function
dom.createRawTextNode = createRawTextNode

然后,继续好像没有发生任何事情:

root = dom.documentElement

evil = dom.createElement('evil')
root.appendChild(evil)
evil.appendChild(dom.createTextNode(SNOWMAN))

good = dom.createElement('good')
root.appendChild(good)
# use helper function to create Nodes of RawText
good.appendChild(dom.createRawTextNode(SNOWMAN))

# yay, works! |o_0|
print(dom.toprettyxml(indent=' '))

最后它做了我想要的!

我的输出中的两个转义和非转义字符串都没有问题。

<?xml version="1.0" ?>
<root>
 <evil>&amp;#x2603;&amp;#xfe0e;</evil>
 <good>&#x2603;&#xfe0e;</good>
</root>