Question

我正在尝试打开并读取文本文件并计算单词出现的类型数量，例如，如果文本中的单词更好，则其频率为8.我已附加下面的代码。我收到以下错误

UnicodeDecodeError：'utf-8'编解码器无法解码位置861中的字节0x97：无效的起始字节

file=open('IntroductoryCS.txt')

wordcount={}

for word in file.read().split():
        if word not in wordcount:
           wordcount[word] = 1
        else:
           wordcount[word] += 1

for k,v in wordcount.items():
      print k, v

我正在使用IDLE 3.5.1

Answer 1

您的IntroductoryCS.txt似乎不是UTF-8。

您应该在open（）函数中更改编码。

这样的事情：

file=open('IntroductoryCS.txt', encoding='<your_encoding_here>')

请参阅文档here。

我不知道你的文件是什么编码但是试试这个：

file=open('IntroductoryCS.txt', encoding='latin-1')

以下是可用的encodings。

Answer 2

您的代码运行正常。

尝试将txt文件另存为UTF-8。在记事本上打开文件，然后另存为，并选择编码UTF-8。

Python Word出现

2 个答案: