应用错误收集

我正在使用Python（3.6.3）中的NLTK.corpus模块来构建和分析我创建的语料库。该语料库由数百个文档组成。要访问语料库中文档的内容，我使用.raw命令，但这会引发解码错误。

fileids = newcorpus.fileids() *newcorpus is the PlaintextCorpusReader object I have created

for f in fileids:
    if f not in normalized_docs:
        p = newcorpus.raw(f)

我收到的错误如下：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xeb in position 200: invalid continuation byte

我该怎么做才能防止这种情况发生？

使用nltk.corpus.reader.plaintext阅读语料库文本 - Python 3

0 个答案: