我无法读取文件,因为我收到" UnicodeDecodeError:' utf-8'编解码器无法解码"错误

时间:2017-08-28 07:16:12

标签: python utf-8

我有一个文件,想要将其转换为utf8编码。

当我想阅读时,我收到此错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 947: invalid continuation byte

我的目的是阅读它,然后将其转换为utf8编码格式,但它不允许阅读。

这是我的代码:

#convert all files into utf_8 format
import os
import io
path_directory="some path string"
directory = os.fsencode(path_directory)
for file in os.listdir(directory):
    file_name=os.fsdecode(file)
    file_path_source=path_directory+file_name
    file_path_dest="some address to destination file"
    with open(file_path_source,"r") as f1:
        text=f1.read()
    with io.open(file_path_dest,"w+",encoding='utf8') as f2:
        f2.write(text)
    file_path=""
    file_name=""
    text=None

,错误是:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-47-59e5e52ddd40> in <module>()
     10     with open(file_path,"r") as f1:
     11         print(type(f1))
---> 12         text=f1.read()
     13     with io.open(file_path.replace("ref_sum","ref_sum_utf_8"),"w+",encoding='utf8') as f2:
     14         f2.write(text)

/home/afsharizadeh/anaconda3/lib/python3.6/codecs.py in decode(self, input, final)
    319         # decode input (taking the buffer into account)
    320         data = self.buffer + input
--> 321         (result, consumed) = self._buffer_decode(data, self.errors, final)
    322         # keep undecoded input until the next call
    323         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 947: invalid continuation byte

如何在不阅读文件的情况下将文件转换为utf8?

1 个答案:

答案 0 :(得分:1)

这很明显。如果你想打开一个文件而不是 utf8 for python3( utf8 是python3和 ascii <的默认编码/ strong>对于python2 )然后你必须提到你知道该文件在打开时所处的编码:

io.open(file_path_dest,"r",encoding='ISO-8859-1')

在这种情况下,编码是 ISO-8859-1 ,所以你必须提及它。