Python' charmap'编解码器不能解码字节0x9d

时间:2018-01-14 17:24:51

标签: python utf-8

我正在尝试运行以下python脚本:

#! python
import textmining
import glob

tdm = textmining.TermDocumentMatrix()

files = glob.glob("C:/Users/farre/Desktop/matrix/blurbs/*")
print(files)
for f in files:
  content = open(f).read()
  content = content.replace('\n', ' ')
  tdm.add_doc(content)
tdm.write_csv('matrix.csv', cutoff=1)

但我收到错误

File "matrix.py", line 13, in <module>
   content = open(f).read()
File "C:\Users\farre\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 688: character maps to <undefined>

我已经尝试了一些我在这里看到但没有任何效果的东西,我尝试使用io.open(filename,encoding="utf8")但是我得到了:

File "matrix.py", line 11, in <module>
    content = io.open(f, encoding="utf8").read()
File "C:\Users\farre\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 310: invalid start byte

任何人都知道如何解决这个问题?

0 个答案:

没有答案