Question

我正在尝试用诗歌基金会的每日诗歌RSS Feed打印一首诗，其中thermal printer支持CP437的编码。这意味着我需要翻译一些角色;在这种情况下，连字符连字符。但python甚至不会编码en dash开头。当我尝试解码字符串并用连字符替换en-dash时出现以下错误：

Traceback (most recent call last):
  File "pftest.py", line 46, in <module>
    str = str.decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 140: ordinal not in range(128)

这是我的代码：

#!/usr/bin/python
#-*- coding: utf-8 -*-

# This string is actually a variable entitled d['entries'][1].summary_detail.value
str = """Love brought by night a vision to my bed,
One that still wore the vesture of a child
But eighteen years of age – who sweetly smiled"""

str = str.decode('utf-8')
str = str.replace("\u2013", "-") #en dash
str = str.replace("\u2014", "--") #em dash
print (str)

我实际上可以使用以下代码在终端窗口（Mac）中无错误地打印输出，但我的打印机会喷出3个CP437字符集：

str = u''.str.encode('utf-8')

我正在使用Sublime Text作为我的编辑器，并且我已经使用UTF-8编码保存了页面，但我不确定这会有所帮助。我非常感谢您对此代码的任何帮助。谢谢！

Answer 1

我不完全理解您的代码中发生了什么，但我也一直在尝试用从网上获得的字符串中的连字符替换en-dashes，这就是对我有用的东西。我的代码就是这样：

txt = re.sub(u"\u2013", "-", txt)

我正在使用Python 2.7和Sublime Text 2，但我不打算在我的脚本中设置-*- coding: utf-8 -*-，因为我试图不引入任何新的编码问题。（即使我的变量可能包含Unicode，我也希望保持我的代码纯ASCII。）你需要在.py文件中包含Unicode，还是只是为了帮助调试？

我会注意到我的txt变量已经是一个unicode字符串，即

print type(txt)

产生

<type 'unicode'>

我很想知道type(str)会在你的情况下产生什么。

我在代码中注意到的一件事是

str = str.replace("\u2013", "-") #en dash

你确定做了什么吗？我的理解是\u仅表示u""字符串中的“unicode character”，而你在那里创建的是一个包含5个字符的字符串，一个“u”，一个“2”，一个“ 0“等等（第一个字符是因为你可以转义任何字符，如果没有特殊含义，比如'\ n'或'\ t'，它只是忽略反斜杠。）

此外，您从打印机获得3个CP437字符的事实让我怀疑您的字符串中仍然有一个短划线。 en-dash的UTF-8编码为3个字节：0xe2 0x80 0x93。当您在包含en-dash的unicode字符串上调用str.encode('utf-8')时，您将在返回的字符串中获得这三个字节。我猜你的终端知道如何把它解释为一个冲刺，这就是你所看到的。

如果你不能让我的第一种方法工作，我会提到我也取得了成功：

txt = txt.encode('utf-8')
txt = re.sub("\xe2\x80\x93", "-", txt)

如果您在致电re.sub()之后提出来encode()，那么decode()可能对您有用。在这种情况下，您可能根本不需要调用{{1}}。我承认我真的不明白为什么会这样。

Python：ascii编解码器不能编码en-dash

1 个答案: