Question

在Python Unicode Howto中，它说：

请注意，这些[string]方法的参数可以是Unicode字符串或   8位字符串。 8位字符串之前将转换为Unicode   进行操作; Python的默认ASCII编码将是   使用，所以大于127的字符将导致异常：

>>> s.find('Was\x9f') Traceback (most recent call last):

... UnicodeDecodeError: 'ascii' codec can't decode byte 0x9f in position 3: ordinal not in range(128)

>>> s.find(u'Was\x9f')

-1

https://docs.python.org/2/howto/unicode.html

所以你会假设一个unicode字符串可以在find / replace / count函数中使用unicode字符串，但它看起来并不那么简单。在Python控制台中查看：

>>> type(u'hi')
<type 'unicode'>
>>> type('i')
<type 'str'>
>>> type('mÑ')
<type 'str'>
>>> u'hi'.replace('i','m')
u'hm'
>>> u'hi'.replace('i','mÑ')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
>>> 'hi'.replace('i','mÑ')
'hm\xc3\x91'

那么，在任何类型的字符串中正确替换此类替换的最佳方法是if else case的所有str值，还是所有u＆＃39;值＆＃39;如果str（类型（输入））具有＆＃39; unicode＆＃39;？

更糟糕的是，无论有没有“你好”，这似乎都没有做任何事情。在字符串之前：

print((u"<head><script>%s</script>" % (thevariable,))

但仅在我使用from __future__ import unicode_literals ??

时

Answer 1

在u'hi'.replace('i','mÑ')的情况下，您有一个Unicode字符串，因此replace需要Unicode字符串。两者都是字节字符串，因此使用默认的ascii编解码器进行转换，而Ñ不是ASCII。

在'hi'.replace('i','mÑ')的情况下，你有一个字节字符串，所以替换需要字节字符串。这就是你给它的，所以它有效。在Python 2.7中，使用非ASCII的字节字符串在源代码编码中进行编码，因此我希望您在脚本顶部有一个#coding:utf8并将源保存为UTF-8，因为\xc3\x91是Ñ的UTF-8。

Python 3禁止字节字符串常量中的非ASCII字符（您仍然可以嵌入十六进制转义符，例如b'm\xc3\x91'，但b'mÑ'会出错）并且不会进行隐式编码/解码使用了错误的类型，因此它有助于彻底分离问题。

替换有时会导致unicode字符串中的异常

1 个答案: