因此,我正在尝试编写一个程序以读取文本文件,然后删除停用词,然后找到最常用的词并将其写入字典,然后对字典进行排序。
似乎我能够找到最常见的单词,但是当我对字典进行排序以首先显示最常见的单词时,将返回NoneType而不是list,并且出现TypeError。为什么会这样?
import string
#Read in book and stopwords (lower case)
sense_and_sensibility_dirty = open("Sense_and_Sensibility.txt").read().rstrip("\n")
stop_words = open("stopwords.txt").read().split()
stop_words = [x.lower() for x in stop_words]
#Remove punctuation from the book and clean it up
translator = str.maketrans('', '', string.punctuation)
sns = sense_and_sensibility_dirty.translate(translator)
sns = sns.split()
#Convert words in book to lowercase
sns = [x.lower() for x in sns]
#Remove stop words from book
sns = [x for x in sns if x not in stop_words]
#Count up words in the book and write word and count to dictionary
word_count={}
for word in sns:
if word not in word_count:
word_count[word] = 1
else:
word_count[word] += 1
#Sort the dictionary to display most frequent
e = sorted(word_count.items(), key=lambda item: item[1])
e = e.reverse()
e[:4]
例如,e [:4]应该输出类似:
[('time', 237), ('dashwood', 224), ('sister', 213), ('miss', 209)]
但是我得到了:
"TypeError: 'NoneType' object is not subscriptable".
答案 0 :(得分:2)
lst.reverse
是一个可变操作,并返回None
,您不应重新声明该变量:
e = sorted(word_count.items(), key=lambda item: item[1])
e.reverse()
e[:4]
答案 1 :(得分:0)
我希望这会有所帮助!在排序方法本身中包含reverse = True
from string import punctuation
from collections import Counter
with open('Sense_and_Sensibility.txt','r') as f:
sense_and_sensibility_dirty = f.read()
with open('stopwords.txt','r') as f:
stopwords = f.read().split()
stop_words = [x.lower() for x in stop_words]
sense_and_sensibility_dirty = sense_and_sensibility_dirty.lower()
all_text = ''.join([c for c in sense_and_sensibility_dirty if c not in punctuation])
sense_and_sensibility_dirty_split = all_text.split('\n')
all_text = ' '.join(sense_and_sensibility_dirty_split)
words = all_text.split()
req_words = [word for word in words if word not in stop_words]
word_counts = Counter(req_words)
sorted_words = sorted(word_counts, key = word_counts.get, reverse= True)