Question

According to the docs，编码string_escape的内置字符串：

在Python源代码
中生成一个适合作为字符串文字的字符串

... unicode_escape：

在Python源代码中生成一个适合作为Unicode文字的字符串

所以，他们应该有大致相同的行为。但是，他们似乎对待单引号的方式不同：

>>> print """before '" \0 after""".encode('string-escape')
before \'" \x00 after
>>> print """before '" \0 after""".encode('unicode-escape')
before '" \x00 after

string_escape转义单引号，而Unicode转义则不转义。可以安全地假设我可以简单地说：

>>> escaped = my_string.encode('unicode-escape').replace("'", "\\'")

...并获得预期的行为？

编辑：为了更清楚，预期的行为是获得适合作为文字的内容。

Answer 1

根据我对CPython 2.6.5源代码中unicode-escape和unicode repr的实现的解释，是的; repr(unicode_string)和unicode_string.encode('unicode-escape')之间的唯一区别是包含引用引号并转义使用的引用。

它们都由同一个函数unicodeescape_string驱动。此函数采用一个参数，其唯一功能是切换包装引号的添加和该引用的转义。

Answer 2

在0≤c<0的范围内。 128，是的，'是CPython 2.6的唯一区别。

>>> set(unichr(c).encode('unicode_escape') for c in range(128)) - set(chr(c).encode('string_escape') for c in range(128))
set(["'"])

在此范围之外，这两种类型不可交换。

>>> '\x80'.encode('string_escape')
'\\x80'
>>> '\x80'.encode('unicode_escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can’t decode byte 0x80 in position 0: ordinal not in range(128)

>>> u'1'.encode('unicode_escape')
'1'
>>> u'1'.encode('string_escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: escape_encode() argument 1 must be str, not unicode

在Python 3.x上，string_escape编码不再存在，因为str只能存储Unicode。

Python“string_escape”vs“unicode_escape”

2 个答案: