Question

您好我想将字符串保存到变量中：

  msg=_(u'Uživatel <a href="{0}">{1} {3}</a>').format(request.user.get_absolute_url, request.user.first_name, request.user.last_name)

但是因为插入的变量包含带有重音的字符，例如š我得到了UnicodeDecodeError，即使我已经按# -*- coding: utf-8 -*-设置了编码

当我通过连接这样的变量来创建这个字符串时它很有用（恕我直言）：

msg=u'Uživatel <a href="' + request.user.get_absolute_url + ...

我不知道为什么它不应该工作，因为它的运行项目，我不得不多次使用这些语句。

如果您有任何建议如何解决这个问题，我将非常感激。

Answer 1

您的user个查找之一是返回编码的字节字符串而不是Unicode对象。

当要求Python 2.x连接Unicode和编码的字节串时，它是通过使用默认编码将字节串解码为Unicode来实现的，这是ascii，除非你付出一些努力来改变它。 # -*- coding: utf-8 -*-指令设置源代码的编码，但不设置系统默认编码。

从测试format开始，看起来它试图将参数转换为匹配左侧的类型。

在2.x下，只要您使用的字节串可以使用ascii解码，事情就会正常工作：

>>> u'test\u270c {0}'.format('bar')
u'test\u270c bar'

或者您当然要在另一个Unicode对象中进行格式化：

>>> u'test\u270c {0}'.format(u'bar\u270d')
u'test\u270c bar\u270d'

如果您在格式之前省略u，那么您将获得UnicodeEncodeError：

>>> 'foo {0}'.format(u'test\u270c')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u270c' in position 4: ordinal not in range(128)

相反，如果将带有非ascii字节的编码字符串格式化为Unicode对象，则会得到UnicodeDecodeError：

>>> u'foo {0}'.format(test.encode('utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)

我首先检查get_absolute_url实施情况。有效的URL永远不能包含未转义的非ascii字符，因此它们应始终由ascii解码，但如果您使用从标准Django模型构建的内容first_name和last_name应该是Unicode对象，那么我首先打赌get_absolute_url的错误实施。

Answer 2

检查要格式化的参数类型，我猜它们是'str'，而不是'unicode'。在使用之前，请对它们进行适当的编码，例如：

url = request.user.get_absolute_url
if isinstance(url, str):
    print 'url was str'
    a = url.decode('utf-8')
msg = u'Uživatel <a href="{0}">...</a>').format(url)

（if和print声明仅用于演示目的）相应地使用其他值。

Answer 3

解决方案非常简单，我使用get_absolute_url代替get_absolute_url()。抱歉打扰你。

Python格式UnicodeDecodeError

3 个答案: