为什么python代码无法打开文件夹中的所有文本文件?

时间:2019-05-03 14:35:02

标签: python utf-8

我已经编写了一些代码,用于打开文件夹中的所有文本文件,删除某些单词,然后将内容写入新的文本文件。我正在使用的文件是在Windows计算机上创建的,保存在utf-8中,然后下载到Mac(有问题)。该代码适用于250个文件中的66个,然后中断。我收到以下错误:

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-51-7c4734f2a95f> in <module>
      1 for file in os.listdir(path):
      2         with open(file, 'r', encoding='utf-8') as f:
----> 3             flines = f.readlines()
      4             new_content = []
      5             for line in flines:

~/anaconda/lib/python3.7/codecs.py in decode(self, input, final)
    320         # decode input (taking the buffer into account)
    321         data = self.buffer + input
--> 322         (result, consumed) = self._buffer_decode(data, self.errors, final)
    323         # keep undecoded input until the next call
    324         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 287: invalid start byte

我已经检查了一些文件的文件编码,这些文件在终端中没有使用文件-I {filename}进行转换,并且确实显示为charset = utf-8。但是我认为问题一定是编码。

我尝试做'encoding ='ascii',并使用'rb'代替'r',但是没有成功。我认为这可以帮助我,但我无法解决如何将其合并到我的代码https://www.programiz.com/python-programming/methods/string/encode中。

任何帮助将不胜感激。

for file in os.listdir(path):       
        with open(file, 'r', encoding='utf-8') as f:
            flines = f.readlines()
            new_content = []
            for line in flines: 
                content = line.split()

            new_content_line = []
            new_content_line2 = []

            fillers = ['um', 'uum', 'umm', 'er', 'eer', 'uh', 'ah', 'ahh', 'hm', 'hmm', 'mm', 'Um', 'Uum', 'Umm', 'Er', 'Eer', 'Uh', 'Ah', 'Ahh', 'Hm', 'Hmm', 'Mm']

            for word in content:
                if not word.startswith('[=')and not word.startswith('#') and not word.startswith('..') and not word.endswith(']') and not word.endswith('='):
                    new_content_line.append(word)

            for word in new_content_line:
                if word not in fillers:
                    new_content_line2.append(word)

            new_content_line2 = [x.lower() for x in new_content_line2]
            for v, w in zip(new_content_line2[:-1],new_content_line2[1:]):
                if v == w:
                    new_content_line2.remove(v)

            new_content.append(' '.join(new_content_line2))

        f2 = open((file.rsplit( ".", 1 )[ 0 ] ) + "_processed.txt", "w", encoding = 'utf-8')
        f2.write('\n'.join(new_content))
        f.close
        f2.close

0 个答案:

没有答案