python 2.7 character \ u2013

时间:2013-12-02 13:49:51

标签: python python-2.7 utf-8 windows-console

我有以下代码:

# -*- coding: utf-8 -*-

print u"William Burges (1827–81) was an English architect and designer."

当我尝试从cmd运行它时。我得到以下信息:

Traceback (most recent call last):
  File "C:\Python27\utf8.py", line 3, in <module>
    print u"William Burges (1827ŌĆō81) was an English architect and designer."
  File "C:\Python27\lib\encodings\cp775.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' in position
 20: character maps to <undefined>

我怎样才能解决这个问题并让Python读取这个\ u2013字符?为什么Python不使用现有代码读取它,我认为utf-8适用于每个角色。

谢谢

修改

此代码打印出想要的结果:

# -*- coding: utf-8 -*-

print unicode("William Burges (1827-81) was an English architect and designer.", "utf-8").encode("cp866")

但是当我尝试打印多个句子时,例如:

# -*- coding: utf-8 -*-

print unicode("William Burges (1827–81) was an English architect and designer. I am here. ", "utf-8").encode("cp866")

我收到同样的错误消息:

Traceback (most recent call last):
  File "C:\Python27\utf8vs.py", line 3, in <module>
    print unicode("William Burges (1827ŌĆō81) was an English architect and desig
ner. I am here. ", "utf-8").encode("cp866")
  File "C:\Python27\lib\encodings\cp866.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' in position
 20: character maps to <undefined>

3 个答案:

答案 0 :(得分:2)

我怀疑问题归结为print语句而不是python固有的任何东西(它在我的Mac上工作正常)。为了打印字符串,需要将其转换为可显示的格式;您使用的较长的短划线不能在Windows命令行的默认字符集中显示。

你的两个句子之间的区别不在于长度,而在于“(1827-81)”与“(1827-81)”中使用的短划线 - 你能看到微妙的区别吗?尝试复制并粘贴其中一个来检查这一点。

另见Python, Unicode, and the Windows console

答案 1 :(得分:1)

wiki.python.org上有关于此问题https://wiki.python.org/moin/PrintFails的wiki文章解释了为什么charmap编解码器可能会发生这种情况。

Setting the PYTHONIOENCODING environment variable as described above can be used to suppress the error messages. Setting to "utf-8" is not recommended as this produces an inaccurate, garbled representation of the output to the console. For best results, use your console's correct default codepage and a suitable error handler other than "strict".

答案 2 :(得分:0)

你的字符串包含ndash sumbol。它类似于ascii减去-,见符号No 45和ascii table。将ndash替换为减号,因为ascii不能包含ndash。以下工作变体:

# -*- coding: utf-8 -*-

my_string = "William Burges (1827–81) was an English architect and designer."
my_string = my_string.replace("–", "-")# replace utf-8 symbol (ndash) to ascii (-)
print my_string

输出

William Burges (1827-81) was an English architect and designer. I am here.