我有一个txt文件,我想从中计算每个单词的频率,之后我想对列表进行排序,排序后我想要按照它们的降序打印频率相关词。我写了python代码,但我不知道如何做到这一点。代码是
frequency = []
file = open("C:/Python26/rzlt.txt");
contents=file.read();
tokens = nltk.word_tokenize(contents);
f=open("frequencies.txt",'w')
f2=open("count.txt",'w')
for t in tokens:
freq = str(tokens.count(t))
frequency.append(freq)
f.write(t+"\t"+freq)
frequency.sort(reverse=True)
for t in tokens:
f2.write(t+"\t"+ frequency(t))
f.close()
f2.close()
答案 0 :(得分:1)
with open() as .. :
会自动关闭文件。 collections.Counter()
计算列表中的所有字词。
最后sorted()
按递减值顺序对Counter()
对象进行排序。
import collections
with open('my_text_file.txt', 'r') as f:
f_as_lst = f.read().split()
c = collections.Counter(f_as_lst)
# Creates a list of tuples with values and keys swapped
freq_lst = [(v, k) for k, v in c.items()]
# Sorts list by frequency
freq_lst = sorted(freq_lst, key=lambda item: item[0])
print freq_lst
如果您无法使用collections.Counter()
,可以使用以下功能替换它:
def my_counter(list_of_strings):
dct = {}
for string in list_of_strings:
if string not in dct:
dct.update({string: 1})
else:
dct[string] += 1
return dct
答案 1 :(得分:1)
尝试这样:使用计数器
import nltk
from collections import Counter
file = open("C:/Python26/rzlt.txt");
contents = file.read();
tokens = nltk.word_tokenize(contents);
words = map(str.isalnum, tokens)
frequency = Counter(words)
for x, y in sorted(frequency.items(), key=lambda x:x[1]):
print x, y
答案 2 :(得分:1)
Try This, I had used collections for getting the count of the each word,
and for displaying it in ascending ordered i used sorted with parameter
reverse=Trueimport collections ## import the collection module
file = open("filename.txt") ## open the file which need to be sorted
list = [] ## Create the empty list
print "sorted data : "
print "==============================================="
for data in file: ## Iterate the data file
list.append(data.strip())
print "\n".join(sorted(list)) ## Print each read line on next line
count = collections.Counter(list) ## Get the count of the each word
print "==============================================="
print "Count of each word is:"
for data in sorted(count, reverse=True): ## Iterate the file in ascending order
print '%s : %d' % (data, count[data]) ## Print the read file in ascending order