Question

我下载了一个Facebook消息数据集，其格式如下：

f\u00c3\u00b8rste student

应该是første student，但我似乎无法正确解码。

我尝试过：

str = 'f\u00c3\u00b8rste student'
print(str)
# 'fÃ¸rste student'

str = 'f\u00c3\u00b8rste student'
print(str.encode('utf-8')) 
# b'f\xc3\x83\xc2\xb8rste student'

但这没用。

Answer 1

要撤消已发生的任何编码欺骗，您首先需要通过使用ISO-8859-1（Latin-1）进行编码，将字符转换为具有相同序数的字节，然后再将其解码为UTF-8：< / p>

>>> 'f\u00c3\u00b8rste student'.encode('iso-8859-1').decode('utf-8')
'første student'

如何在python中解码此字符串？

1 个答案: