Question

我正在使用p.encode('utf-8')将字符串编码为utf-8。我试图捕捉

可能出错的地方

def assert_encoding(s):
    try:
        if s is None or pd.isnull(s) or (not isinstance(s, basestring)) or s.decode('utf-8') :
            return True
    except UnicodeError:
        return False

字符串经过assert(encoding(s))但是INSERT INTO我的Postfres数据库（为UTF-8配置）失败，错误提示0xC3 0x20不是UTF-8支持的字节序列。

assert_encoding中是否存在循环漏洞？

Answer 1

我想我可能也有原因。

鉴于：

s = 'cil à cil'.decode('latin-1')

然后我们将其编码为utf-8：

'cil à cil'.decode('latin-1').encode('utf-8')

有些cols很长。我必须缩短他们做的事情：

 'cil à cil'.decode('latin-1').encode('utf-8')[0:x]

其中x是字符数，或者至少我认为是the number of characters。

实际上，通过不正确地设置x，我可能会在错误的位置剪切utf-8字符串。

'cil à cil'.decode('latin-1').encode('utf-8')[0:7].decode('utf-8')

在我的代码中，我只在缩短字符串之前检查编码。

在ut8编码的字符串中捕获0xC3

1 个答案: