Question

我有一个包含unicode数据的对象，我想在其代表中使用它 e.g。

# -*- coding: utf-8 -*-

class A(object):

    def __unicode__(self):
        return u"©au"

    def __repr__(self):
        return unicode(self).encode("utf-8")

    __str__ = __repr__ 

a = A()


s1 = u"%s"%a # works
#s2 = u"%s"%[a] # gives unicode decode error
#s3 = u"%s"%unicode([a])  # gives unicode decode error

现在，即使我从 repr 返回unicode，它仍然会出错所以问题是如何使用这些对象的列表并从中创建另一个unicode字符串？

平台详情：

"""
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
'Linux-2.6.24-19-generic-i686-with-debian-lenny-sid'
"""

也不确定为什么

print a # works
print unicode(a) # works
print [a] # works
print unicode([a]) # doesn't works

python小组回答这个问题 http://groups.google.com/group/comp.lang.python/browse_thread/thread/bd7ced9e4017d8de/2e0b07c761604137?lnk=gst&q=unicode#2e0b07c761604137

Answer 1

s1 = u"%s"%a # works

这很有效，因为在处理'a'时它正在使用它的unicode表示（即 unicode 方法），

但是当你将它包装在诸如'[a]'之类的列表中时......当你试图将该列表放在字符串中时，所谓的是unicode（[a]）（它与列表中的repr，列表的字符串表示，它将使用'repr（a）'在其输出中表示您的项目。这会导致问题，因为您传递的'str'对象（字符串）包含utf-8编码版本的'a'，并且当字符串格式试图将其嵌入到您的unicode字符串中时，它将会尝试使用hte默认编码（即ASCII）将其转换回unicode对象。因为ascii没有任何它想要转换的角色，所以它失败了

你想要做的事情必须这样做：u"%s" % repr([a]).decode('utf-8')假设你的所有元素都编码为utf-8（或ascii，从unicode的角度来看是一个utf-8子集）。

为了获得更好的解决方案（如果您仍然希望保持字符串看起来像列表str），您必须使用之前建议的内容，并使用join，如下所示：

∪'[%s]' % u','.join(unicode(x) for x in [a,a])

虽然这不会处理包含A对象列表的列表。

我的解释听起来非常不清楚，但我希望你能从中得到一些理解。

Answer 2

尝试：

s2 = u"%s"%[unicode(a)]

您的主要问题是您的转化次数超出预期。让我们考虑以下事项：

s2 = u"%s"%[a] # gives unicode decode error

来自Python Documentation，

    's'     String (converts any python object using str()).
    If the object or format provided is a unicode string, 
    the resulting string will also be unicode.

正在处理％s格式字符串时，应用str（[a]）。此时你所拥有的是一个包含一系列unicode字节的字符串对象。如果您尝试打印它没有问题，因为字节直接传递到您的终端并由终端呈现。

>>> x = "%s" % [a]
>>> print x
[©au]

当您尝试将其转换回unicode时会出现问题。本质上，函数unicode是在包含unicode编码字节序列的字符串上调用的，这就是导致ascii编解码器失败的原因。

    >>> u"%s" % x
    Traceback (most recent call last):
      File "", line 1, in 
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1: ordinal not in range(128)
    >>> unicode(x)
    Traceback (most recent call last):
      File "", line 1, in 
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1: ordinal not in range(128)

Answer 3

首先，问问自己你想要实现的目标。如果你想要的只是列表的圆形表示，你应该简单地执行以下操作：

class A(object):
    def __unicode__(self):
        return u"©au"
    def __repr__(self):
        return repr(unicode(self))
    __str__ = __repr__

>>> A()
u'\xa9au'
>>> [A()]
[u'\xa9au']
>>> u"%s" % [A()]
u"[u'\\xa9au']"
>>> "%s" % [A()]
"[u'\\xa9au']"
>>> print u"%s" % [A()]
[u'\xa9au']

这就是应该如何运作的。 python列表的字符串表示不是用户应该看到的，因此在其中包含转义字符是有意义的。

Answer 4

如果您想使用unicode()个对象列表来创建unicode字符串，请尝试以下操作：

u''.join([unicode(v) for v in [a,a]])

Answer 5

由于这个问题涉及很多令人困惑的unicode东西，我想我会对这里发生的事情进行分析。

这一切都归结为内置__unicode__类的__repr__和list的实现。基本上，它相当于：

class list(object):
    def __repr__(self):
        return "[%s]" % ", ".join(repr(e) for e in self.elements)
    def __str__(self):
        return repr(self)
    def __unicode__(self):
        return str(self).decode()

实际上，list doesn't even define the __unicode__ and __str__ methods，当你想到它时才有意义。

当你写：

u"%s" % [a]                          # it expands to
u"%s" % unicode([a])                 # which expands to
u"%s" % repr([a]).decode()           # which expands to
u"%s" % ("[%s]" % repr(a)).decode()  # (simplified a little bit)
u"%s" % ("[%s]" % unicode(a).encode('utf-8')).decode()

最后一行是repr（a）的扩展，在问题中使用__repr__的实现。

正如您所看到的，该对象首先在utf-8中进行编码，但稍后才会使用系统默认编码进行解码，该编码通常不支持所有字符。

正如其他一些答案所提到的，您可以编写自己的函数，甚至是子类列表，如下所示：

class mylist(list):
    def __unicode__(self):
        return u"[%s]" % u", ".join(map(unicode, self))

请注意，此格式不是圆形的。它甚至可能会产生误导：

>>> unicode(mylist([]))
u'[]'
>>> unicode(mylist(['']))
u'[]'

对于cource来说，你可以编写一个quote_unicode函数来使它成为可循环的函数，但现在是问你自己what's the point的时刻。 unicode和str函数用于创建对用户有意义的对象的表示。对于程序员来说，有repr函数。原始列表不是用户应该看到的东西。这就是list类没有实现__unicode__方法的原因。

为了更好地了解发生什么事，玩这个小班：

class B(object):
    def __unicode__(self):
        return u"unicode"
    def __repr__(self):
        return "repr"
    def __str__(self):
        return "str"


>>> b
repr
>>> [b]
[repr]
>>> unicode(b)
u'unicode'
>>> unicode([b])
u'[repr]'

>>> print b
str
>>> print [b]
[repr]
>>> print unicode(b)
unicode
>>> print unicode([b])
[repr]

Answer 6

# -*- coding: utf-8 -*-

class A(object):
    def __unicode__(self):
        return u"©au"

    def __repr__(self):
        return unicode(self).encode('ascii', 'replace')

    __str__ = __repr__

a = A()

>>> u"%s" % a
u'\xa9au'
>>> u"%s" % [a]
u'[?au]'

Answer 7

repr 和 str 都应该返回str对象，至少是Python 2.6.x.你得到了解码错误，因为repr（）试图将你的结果转换为str，并且它失败了。

我相信这在Python 3.x中有所改变。

如何使用表示为unicode的python对象列表

7 个答案: