StringIO(HighSurrogate)是否应该在Python 2.7中引发错误?

时间:2019-04-03 09:12:04

标签: python python-2.7 unicode encoding surrogate-pairs

当我运行此Python 2.7代码时(编辑:更新了代码

import io
x = io.StringIO(u'\ud801')

CPython运行正常,但是IronPython抛出以下错误:

UnicodeEncodeError:
Unable to translate Unicode character \uD801 at index 0 to specified code page.

我认为这是因为U+D801 is an unpaired surrogate and thus an invalid character,但是哪个实现在这里显示正确的行为?该代码应该抛出还是不抛出?

1 个答案:

答案 0 :(得分:0)

它们都是正确的,但是没有做相同的事情。 IronPython似乎正在尝试print Unicode字符,但未能将其转换为当前代码页。如果打印字符,则使用Python 2.7会得到相同的行为:

>>> import io
>>> io.StringIO(u'\ud801').getvalue()
u'\ud801'
>>> print(io.StringIO(u'\ud801').getvalue())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\ud801' in position 0: character maps to <undefined>