Question

这是我的代码，

for line in open('u.item'):
#read each line

每当我运行此代码时，它都会出现以下错误：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte

我试图解决这个问题并在open（）中添加一个额外的参数，代码看起来像;

for line in open('u.item', encoding='utf-8'):
#read each line

但它又给出了同样的错误。那我该怎么办！请帮忙。

Answer 1

正如Mark Ransom所说，我找到了正确的编码方式。编码为“ISO-8859-1”，因此将open("u.item", encoding="utf-8")替换为open('u.item', encoding = "ISO-8859-1")将解决问题。

Answer 2

对我来说，ISO 8859-1也会节省很多，哈哈哈，主要是因为使用语音识别API＆＃39>

示例：

file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1");

Answer 3

您的文件实际上并不包含utf-8编码数据，它包含一些其他编码。找出编码是什么，并在open调用中使用它。

例如，在Windows-1252编码中，0xe9将是字符é。

Answer 4

尝试使用pandas

阅读

pd.read_csv('u.item', sep='|', names=m_cols , encoding='latin-1')

Answer 5

如果您使用的是Python 2，则解决方案如下：

import io
for line in io.open("u.item", encoding="ISO-8859-1"):
    # do something

由于encoding参数不能与open()一起使用，您将收到以下错误：

TypeError: 'encoding' is an invalid keyword argument for this function

Answer 6

这有效：

do_setmap

或：

open('filename', encoding='latin-1')

Answer 7

您可以尝试这种方式：

open('u.item', encoding='utf8', errors='ignore')

Answer 8

如果有人在寻找这些，这是在Python 3中转换CSV文件的示例：

try:
    inputReader = csv.reader(open(argv[1], encoding='ISO-8859-1'), delimiter=',',quotechar='"')
except IOError:
    pass

Answer 9

有时，当open(filepath)实际上不是文件的filepath会出现相同的错误，因此首先请确保您要打开的文件存在：< / p>

import os
assert os.path.isfile(filepath)

希望这会有所帮助。

Answer 10

使用 Notepad++ 打开文件，选择“编码”或“编码”菜单以识别或从 ANSI 转换为 UTF-8 或 ISO 8859-1 代码页。

Answer 11

为了在网页上更快地搜索类似问题（关于 UTF-8 错误）的 google 请求，我将我的解决方案留给其他人。

我在打开带有该描述的 .csv 文件时遇到问题：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 150: invalid continuation byte

我用记事本打开了文件并计算了第 150 个位置：这是一个西里尔文符号。我使用编码为“UTF-8”的“另存为..”命令重新保存了该文件，我的程序开始工作了。

Answer 12

您可以通过以下方法解决问题：

for line in open(your_file_path, 'rb'):

'rb'正在以二进制模式读取文件。阅读更多here。希望这会有所帮助！