Question

Python3将unicode行为改为拒绝代理对，而python2则没有。

有一个问题here

但它没有提供如何在python2中删除代理对或如何进行代理转义的解决方案。

Python3示例：

>>> a = b'\xed\xa0\xbd\xe4\xbd\xa0\xe5\xa5\xbd'
>>> a.decode('utf-8', 'surrogateescape')
'\udced\udca0\udcbd你好'
>>> a.decode('utf-8', 'ignore')
'你好'

'\ xed \ xa0 \ xbd'这里不是正确的utf-8字符。我想忽略它们或逃脱它们。

是否可以在python2中执行相同的操作？

Answer 1

没有内置解决方案，但在python-future中有一个surrogateescapes的实现： https://github.com/PythonCharmers/python-future

将from future.utils.surrogateescape import register_surrogateescape添加到导入中。然后调用方法register_surrogateescape()，然后您可以使用errors='surrogateescape'和encode中的decode错误处理程序。

可以找到一个例子here

如何在python2中做surrogateescape

1 个答案: