我使用minidom模块从我的数据创建XML文档。
目前我正在努力寻找一些pythonic方法来防止minidom逃离我放入的弦乐..
所有邪恶的原因是_write_data
方法(在模块的第302行):
def _write_data(writer, data):
"Writes datachars to writer."
if data:
data = data.replace("&", "&").replace("<", "<"). \
replace("\"", """).replace(">", ">")
writer.write(data)
我想要的只是没有那些data
的{{1}}。
我找到了一些方法来通过monkeypathing两个函数来防止这种情况:
replace
父节点writexml
我准备了一些例子:
_write_data
它将产生此输出:
from xml.dom import minidom
SNOWMAN = '☃︎'
imp = minidom.getDOMImplementation()
dom = imp.createDocument(None, 'root', None)
root = dom.documentElement
evil = dom.createElement('evil')
root.appendChild(evil)
# this does unwanted double escaping:
evil.appendChild(dom.createTextNode(SNOWMAN))
# now for something completely different ...
# this is some way to fix this:
good = dom.createElement('good')
root.appendChild(good)
# - store original ``writexml`` and ``_write_data``
original_writexml = good.writexml
original_write_data = minidom._write_data
def fake_writexml(writer, indent, addindent, newl):
def fake_writedata(writer, data):
if data:
writer.write(data)
# - overwrite ``_write_data``
minidom._write_data = fake_writedata
# - call original ``writexml``
# -> which itself calls the now patched ``_write_data``
original_writexml(writer, indent, addindent, newl)
# - reset ``_write_data`` again
minidom._write_data = original_write_data
# - overwrite ``writexml``
good.writexml = fake_writexml
# - do stuff
good.appendChild(dom.createTextNode(SNOWMAN))
# -> yay, it works!
print(dom.toprettyxml(indent=' '))
# - reset ``writexml`` again
good.writexml = original_writexml
# -> returns trash again..
print(dom.toprettyxml(indent=' '))
我个人认为这不是好的代码,因为它与<?xml version="1.0" ?>
<root>
<evil>&#x2603;&#xfe0e;</evil>
<good>☃︎</good>
</root>
<?xml version="1.0" ?>
<root>
<evil>&#x2603;&#xfe0e;</evil>
<good>&#x2603;&#xfe0e;</good>
</root>
的内部混淆,你必须小心不要犯任何错误。
请告诉我你能想到的最棘手的解决方案 - 所以我终于可以享受Snowmans了;-)
☃︎
答案 0 :(得分:1)
在这里进一步思考我的问题,我有一个想法:
是否可以定义新类型的节点?
确实 - 它是!
from xml.dom import minidom
SNOWMAN = '☃︎'
imp = minidom.getDOMImplementation()
dom = imp.createDocument(None, 'root', None)
所以,我在那里定义了自己的Node:
class RawText(minidom.Text):
def writexml(self, writer, indent='', addindent='', newl=''):
'''
patching minidom.Text.writexml:1087
the original calls minidom._write_data:302
below is a combined version of both, but without the '&' replacements and so on..
'''
if self.data:
writer.write('{}{}{}'.format(indent, self.data, newl))
之后我为原始minidom.Document
编写了一些辅助函数来创建我自己类型的新节点。
def createRawTextNode(data):
'''
helper function for minidom.Document:1519 to create Nodes of RawText
see minidom.Document.createTextNode:1656
'''
if not isinstance(data, str):
raise TypeError('node contents must be a string')
r = RawText()
r.data = data
r.ownerDocument = dom # there is no self
return r
# ... and attach the helper function
dom.createRawTextNode = createRawTextNode
然后,继续好像没有发生任何事情:
root = dom.documentElement
evil = dom.createElement('evil')
root.appendChild(evil)
evil.appendChild(dom.createTextNode(SNOWMAN))
good = dom.createElement('good')
root.appendChild(good)
# use helper function to create Nodes of RawText
good.appendChild(dom.createRawTextNode(SNOWMAN))
# yay, works! |o_0|
print(dom.toprettyxml(indent=' '))
最后它做了我想要的!
我的输出中的两个转义和非转义字符串都没有问题。
<?xml version="1.0" ?>
<root>
<evil>&#x2603;&#xfe0e;</evil>
<good>☃︎</good>
</root>