我从未处理过编码和解码字符串,因此我就是这方面的新手。当我尝试使用Python中的file.write将我从另一个文件读取的内容写入临时文件时,我收到了一个UnicodeEncodeError。我收到以下错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 41333: ordinal not in range(128)
以下是我在代码中所做的事情。我正在读取一个XML文件并从" mydata"获取文本。标签。然后我遍历mydata寻找CDATA
parser = etree.XMLParser(strip_cdata=False)
root = etree.parse(myfile.xml, parser)
data = root.findall('./mydata')
# iterate through list to find text (lua code) contained in elements containing CDATA
for item in myData:
myCode = item.text
# Write myCode to a temporary file.
tempDirectory = tempfile.mkdtemp(suffix="", prefix="TEST_THIS_")
file = open(tempDirectory + os.path.sep + "myCode.lua", "w")
file.write(myCode + "\n")
file.close()
当我点击以下行时,它因UnicodeEncodeError而失败:
file.write(myCode + "\n")
我应该如何正确编码和解码?
答案 0 :(得分:23)
Python2.7' open
函数不透明地处理像python3那样的unicode字符。有extensive documentation on this,但是如果你想直接编写unicode字符串而不解码它们,你可以试试这个
>>> import codecs
>>> f = codecs.open(filename, 'w', encoding='utf8')
>>> f.write(u'\u201c')
为了比较,这是错误发生的方式
>>> f = open(filename, 'w')
>>> f.write(u'\u201c')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128)