我不明白Windows 8上的Python 3.3使用Unicode做了什么。文件t.txt包含三个字节,十六进制值e2,80,04,它是em dash的utf-8表示。我希望以下代码显示该字符;我不明白为什么不是,或者为什么涉及cp850.py。任何人都可以解释发生了什么,以及从文本文件中读取Unicode需要做些什么?我太困惑了,无法提出更明确的问题。
>python
Python 3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:06:53) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open( 't.txt', encoding='utf-8' )
>>> s = f.readline()
>>> print(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python33\lib\encodings\cp850.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2014' in position 0: character maps to <undefined>
>>>
>>> import sys
>>> sys.getfilesystemencoding()
'mbcs'
>>> sys.getdefaultencoding()
'utf-8'