Question

我正在将大型项目从python2转换为python3（不需要python2向后兼容）。

在测试转换时，我发现一个问题是某些字符串被转换为bytes对象，这引起了麻烦。我将其追溯到以下方法，该方法在许多地方被调用：

def custom_format(val):
    return val.encode('utf8').strip().upper()

在python2中：

custom_format(u'\xa0')
# '\xc2\xa0'
custom_format('bar')
# `BAR`

在python3中：

custom_format('\xa0')
# b'\xc2\xa0'
custom_format('bar')
# b`BAR`

这是一个问题，原因是在某些时候，custom_format的输出应使用SQL插入format()模板字符串中，但是'foo = {}'.format(b'bar') == "foo = b'BAR'"可能会弄乱SQL语法。

仅删除encode('utf8')部分将确保custom_format('bar')正确返回'BAR'，但是现在custom_format('\xa0')返回'\xa0'而不是'\xc2\xa0' python2版本。（尽管我对Unicode的了解还不足以知道那是不是一件坏事）

在不弄乱代码中的SQL或format()部分的情况下，如何确保python2版本中出现了python3版本的预期行为？它像放下encode('utf8')一样简单，还是会引起意外冲突？

Answer 1

如果您要确保所有传入的字符串，无论是str还是bytes，都必须转换为bytes，则必须保留encode，因为Python3使用使用str代替bytes（Python2就是这种情况）作为本机字符串类型。 encode将str转换为bytes。

如果您要确保查询看起来正确。然后，您可以删除encode并让Python3为您处理事情。

从python2转换为python3时处理encode（）

1 个答案: