Question

我在Python 3中有以下字符串：

bytestring = b'Zeer ge\xc3\xafnteresseerd naar iemands verhalen luisteren.'

如何将其转换为具有正常字符的字符串？那就是：

'Zeer geïnteresseerd naar iemands verhalen luisteren.'

我已经尝试使用以下方法对其进行解码：

bytestring.decode('utf-8)

但是当我尝试将值打印到控制台时，Python给了我以下错误：

UnicodeEncodeError: 'ascii' codec can't encode character '\xef' in position 7: ordinal not in range(128)

任何帮助表示赞赏。

解决方案

我通过在终端中键入以下内容解决了这个问题：

export PYTHONIOENCODING=UTF-8

之后，我能够将解码的字节串打印到控制台。

Answer 1

似乎您使用的是unicode，而不是字符串。看看是否有帮助。您可以使用此自定义功能进行解码；首先使用UTF8，然后使用Latin1，然后编码为ascii。

def CustomDecode(mystring):
    '''Accepts string and tries decode with UTF8 first and then Latin1'''
    c=''.join(map(lambda x: chr(ord(x)),mystring))
    decval = None
    try:
        decval = c.decode('utf8')
    except UnicodeDecodeError:
        decval = c.decode('latin1')
    return decval


CustomDecode(mystring).encode('ascii', 'ignore')

结果：

'Zeer genteresseerd naar iemands verhalen luisteren.'

如何在Python中将具有Unicode字符的字节字符串转换为普通文本？

1 个答案: