Question

使用Python 3.4我尝试使用utf-32解码字节类型时出现以下错误

Traceback (most recent call last):
  File "c:.\SharqBot.py", line 1130, in <module>
    fullR=s.recv(1024).decode('utf-32').split('\r\n')
UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 0-3: codepoint not in range(0x110000)

以及尝试将其解码为utf-16时的以下内容

  File "c:.\SharqBot.py", line 1128, in <module>
    fullR=s.recv(1024).decode('utf-16').split('\r\n')
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x0a in position 374: truncated data

当我使用utf-8解码时没有错误。 s是连接到端口80上的抽搐IRC服务器irc.chat.twitch.tv的套接字。

它收到以下内容：

b':tmi.twitch.tv 001 absolutelyabot :Welcome, GLHF!\r\n:tmi.twitch.tv 002 absolutelyabot :Your host is tmi.twitch.tv\r\n:tmi.twitch.tv 003 absolutelyabot :This server is rather new\r\n:tmi.twitch.tv 004 absolutelyabot :-\r\n:tmi.twitch.tv 375 absolutelyabot :-\r\n:tmi.twitch.tv 372 absolutelyabot :You are in a maze of twisty passages, all alike.\r\n:tmi.twitch.tv 376 absolutelyabot :>\r\n'

尝试解码到16和32时，我做错了什么？我想使用utf-32的原因是因为偶尔会有人发送一个不在utf-8中的字符而我希望能够接收它而不是因为utf-8不支持该字符而抛出错误。谢谢你的帮助。

Answer 1

尝试使用encoding =＆＃39; ISO-8859-1＆＃39;

Answer 2

每个 Unicode序数都可以用UTF-8表示，如果decode因为UTF-8不起作用，那是因为传输的字节采用不同的编码，或者数据是混合文本和二进制数据，只有部分是UTF-8。可能性是是 UTF-8编码的文本（大多数网络协议都是），因此非UTF-8数据将是帧数据等，并且需要被解析以提取文本数据。

在text / binary情况下屏蔽此类错误的任何尝试都只是在解决问题，而不是修复它们。您需要知道数据的编码（以及格式，如果不是所有文本数据都使用单一编码），并使用它。您收到的数据并不会神奇地变成UTF-16或UTF-32，因为您需要它。

Answer 3

您可以尝试使用解码/编码（'utf-16-le'）。我尝试了，对我来说还可以。但是我不清楚为什么。：P

字节类型的UnicodeDecodeError

3 个答案: