我正在尝试写出一些确实有一些特殊字符的XML。我遇到麻烦的地方是我遍历一个标签列表来创建几个名为tag的元素。
# -*- coding: utf-8 -*-
import xml.etree.ElementTree as xml
reload(sys)
sys.setdefaultencoding('utf-8')
代码片段:
check = (video['tags'].split(', '))
x=len(check)
y=x-1
for i in xrange(0,y):
tagger = xml.SubElement(doc, 'field', name="tag")
s=check[i]
tagger.text = s.encode('utf-8')
问题在于我试着写:
output = open(file_name,'w+')
tree = xml.ElementTree(add)
tree.write(output)
output.close()
我收到以下错误:
Traceback (most recent call last):
File "xml_breakup3.py", line 108, in <module>
tagger.text = s.encode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x81 in position 0: invalid start byte
当我运行没有此代码段的代码时,它会毫无问题地编写xml。如果我使用tagger.text =任何类型的字符串(即'99'),它写得很好。如果我使循环从0到3,它就可以工作。只有当我尝试遍历整个列表时才会出现UnicodeDecode错误
当我尝试:
check = (video['tags'].split(', '))
for ta in check:
tagger = xml.SubElement(doc, 'field', name="tag")
tagger.text = ta
我明白了:
Traceback (most recent call last):
File "xml_breakup3.py", line 172, in <module>
tree.write(output)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 821, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 940, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 938, in _serialize_xml
write(_escape_cdata(text, encoding))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1074, in _escape_cdata
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError:'utf8'编解码器无法解码位置0的字节0xba:无效的起始字节
答案 0 :(得分:0)
您可能想尝试从正在编码的部分前面删除str
。当您使用str
时,您将我假设的Unicode转换为字符串,然后您尝试编码。如果您将其保留为Unicode并直接解码,它应该可以工作:
>>> s = u'\xba'
>>> print s
º
>>> s.encode('utf8')
'\xc2\xba'
>>> str(s).encode('utf8')
Traceback (most recent call last):
File "<pyshell#30>", line 1, in <module>
str(s).encode('utf8')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xba' in position 0: ordinal not in range(128)