Question

我的任务是接受布什总统和奥巴马总统的国情咨文，并找出独特单词（或词汇量）和最常用的20个单词的数量。我正处于计算独特单词数量的第一阶段，到目前为止这一点：

    def uniqueWords(fileName):
fileIn = open(fileName, 'rt', encoding = 'UTF-8')
fileS = open('stopwords.txt', 'rt', encoding = 'UTF-8')
stop = fileS.split()
words = fileIn.split()
for x in range(len(words)):
    words[x] = words[x].lower()
    for z in words[x]:
        if z in '~!@#$%^&*()+=_:;,./\?"{}[]<>|':
            words[x] = words[x].replace(z, '')

unique_words = 0
while unique_words < len(words):
    if words[i] in stop:
        words.remove(words[i])
    else:
        unique_words += 1
return unique_words

我一直收到以下错误：

uniqueWords（ 'bush_all.txt'）               Traceback（最近一次调用最后一次）：                  文件“”，第1行，in                   uniqueWords（ 'bush_all.txt'）                 文件“/ Users / sarahloughran / Documents / CSCI 203 / final project / countWords.py”，第12行，在uniqueWords中           stop = fileS.split（）       AttributeError：'_ io.TextIOWrapper'对象没有属性'split'

我看过的每个地方都告诉我使用file.split（）函数，所以我不确定为什么会出现这个错误。非常感谢任何帮助，谢谢！

Answer 1

如果您阅读文件，首先必须先浏览所有行，然后才能将这些行分成单个单词。您可以像这样使用listcomprehension

stop = [word for line in fileS.readlines() for word in line.split()]
words = [word for line in fileIn.readlines() for word in line.split()]

其余的代码看起来很好。

如何将包含语音的文件拆分为python 3上的单词列表？

1 个答案: