我正在尝试使用ElementTree
方法输出到文本的tostring
实例:
tostring(root, encoding='UTF-8')
我得到UnicodeDecodeError
(下面的回溯),因为其中一个Element.text
节点具有u'\u2014'
字符。我将text属性设置如下:
my_str = u'\u2014'
el.text = my_str.encode('UTF-8')
如何将树成功序列化为文本?我是否错误地编码了节点?感谢。
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "crisis_app/converters/to_xml.py", line 129, in convert
return tostring(root, encoding='UTF-8')
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1127, in tostring
ElementTree(element).write(file, encoding, method=method)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 821, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 938, in _serialize_xml
write(_escape_cdata(text, encoding))
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1074, in _escape_cdata
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 288: ordinal not in range(128)
答案 0 :(得分:2)
如果你这样做:
my_str = u'\u2014'
el.text = my_str.encode('UTF-8')
您将文本设置为unicode字符的utf-8编码版本。它与
相同el.text = '\xe2\x80\x94'
现在你不再拥有一个unicode字符,而是一系列字节。
如果你这样做:
tostring(root, encoding='UTF-8')
您说要将内容编码为utf-8。为此,在内部首先使用默认编码(ascii)将字符串解码为unicode,然后编码为utf-8,当然因为字符串中的字节不在ascii范围内而失败。
ElementTree完全能够使用unicode,所以只需给它unicode而不是str:
>>> from xml.etree import ElementTree as et
>>> e = et.Element('test')
>>> e.text = u'\u2014'
>>> s = et.tostring(e)
>>> print s, repr(s)
<test>—</test> '<test>—</test>'
>>> s = et.tostring(e, encoding='utf-8')
>>> print s, repr(s)
<test>—</test> '<test>\xe2\x80\x94</test>'