Question

我有一段代码，其中包含许多字符串，如

mystring = "%s%s/%s/%s" % (str1, str2, str3, str4)
# str1, str2 and str3 are directory names
# str4 is a file name

一直很好，直到最近这种情况在特定的环境中开始失败。

UnicodeDecodeError：'ascii'编解码器无法将字节0xc3解码到位 0：序数不在范围内（128）

我注意到str1和str4在我的情况下都具有unicode字符。

# str1 is being returned from a db1. It is marked as unicode although
it has only ascii characters. Eg. /mnt/s14691711010z1device1
# str2 and str3 are returned from db2 which are ascii strings only
# Eg. AUTH_5a6625ef15144a1896b420b3374cca39, container1
# str4 is a file name which can be unicode characters in any language
(Korean and Japanese in current case). Eg. file5ßðè

从串联中删除它们中的任何一个都会修复失败，但这不是必需的。

我无法理解为什么单个unicode字符串通过但其中2个一起失败。我基本上需要构建一个文件路径来检查是否存在。这些值来自db。

解决此问题的最佳方法是什么？我对unicode的了解非常有限。它应该是encode（）后跟decode（）吗？

一个简单的解决方法是将str1键入为str（），因为它只包含ascii字符。但我希望理解为什么连接2个unicode字符失败而1个工作

在python

0 个答案: