有没有办法在Python 2中重现py3的不在repr()中转义Unicode? p>
$ python3
>>> s="…\n…"
>>> print(repr(s))
'…\n…'
但
$ python2
>>> s=u"…\n…"
>>> print repr(s)
u'\u2026\n\u2026'
我想要
u'…\n…'
我设法提出的解决方案是
#!/usr/bin/python
import re
_uregex=re.compile("\\\\([^uU])")
def _ureplace(x):
x = x.group(1)
if x == "\\":
return "\\\\\\\\" # Eight of them. Required.
return "\\\\"+x
def urepr(x):
return _uregex.sub(_ureplace,repr(x)).decode("unicode-escape")
s = u"\u2026\n\u2026"
print(urepr(s))
但我想知道是否有更好的方法来做到这一点 - 逃避一切,只是为了解决所有问题,似乎相当浪费。而且速度慢(我需要这个以便快速编写很多大对象并重新编写日志文件)。
答案 0 :(得分:0)
我不认为Python 2提供了这样做的方法,但是编写自己的代码很容易。
import unicodedata
def unichr_repr(ch):
if ch == '\\':
return '\\\\'
elif ch == "'":
return "\\'"
category = unicodedata.category(ch)
if category == 'Cc':
if ch == '\n':
return '\\n'
n = ord(ch)
if n < 0x100:
return '\\x%02x' % n
if n < 0x10000:
return '\\u%04x' % n
return '\\U%08x' % n
return ch
def unistr_repr(s):
return "'" + ''.join(unichr_repr(ch) for ch in s) + "'"
答案 1 :(得分:0)
这是一个更完整的解决方案,适用于unicode字符串列表:
import reprlib
import sys
class URepr(reprlib.Repr):
"""
On python 3, repr returns unicode objects, which means that non-ASCII
characters are rendered in human readable form.
This provides a similar facility on python 2.
Additionally, on python 3, it prefixes unicode repr with a u, such that
the returned repr is a valid unicode literal on both python 2 and python
3
"""
# From https://github.com/python/cpython/blob/3.6/Objects/unicodectype.c#L147-L1599
nonprintable_categories = ('Cc', 'Cf', 'Cs', 'Co', 'Cn', 'Zl', 'Zp', 'Zs')
if sys.version_info.major >= 3:
def repr_str(self, obj, level):
return 'u' + super().repr_str(obj, level)
else:
def repr_unicode(self, obj, level):
def _escape(ch):
# printable characters that have special meanings in literals
if ch == u'\\':
return u'\\\\'
elif ch == u"'":
return u"\\'"
# non-printable characters - convert to \x.., \u...., \U........
category = unicodedata.category(ch)
if category in self.nonprintable_categories:
return ch.encode('unicode-escape').decode('ascii')
# everything else
return ch
return u"u'{}'".format(''.join(_escape(c) for c in obj))
可用作:
repr = URepr().repr
repr([u'hello', u'world'])
答案 2 :(得分:0)
虽然我了解您希望使用您的方法,但如果您将Unicode作为数值接收,我可以建议使用函数chr()
吗?
答案 3 :(得分:-1)
尝试
repr(string).decode("utf-8")