如何在python中排序后查找单词的频率

时间:2015-04-03 10:22:23

标签: python

我有一个txt文件,我想从中计算每个单词的频率,之后我想对列表进行排序,排序后我想要按照它们的降序打印频率相关词。我写了python代码,但我不知道如何做到这一点。代码是

frequency = []
file = open("C:/Python26/rzlt.txt");
contents=file.read();
tokens = nltk.word_tokenize(contents);
f=open("frequencies.txt",'w')
f2=open("count.txt",'w')
for t in tokens:
    freq = str(tokens.count(t))
    frequency.append(freq)
    f.write(t+"\t"+freq)
frequency.sort(reverse=True)
for t in tokens:
    f2.write(t+"\t"+ frequency(t))
f.close()
f2.close()

3 个答案:

答案 0 :(得分:1)

with open() as .. :会自动关闭文件。 collections.Counter()计算列表中的所有字词。

最后sorted()按递减值顺序对Counter()对象进行排序。

import collections

with open('my_text_file.txt', 'r') as f:

    f_as_lst = f.read().split()
    c = collections.Counter(f_as_lst)

# Creates a list of tuples with values and keys swapped
freq_lst = [(v, k) for k, v in c.items()]
# Sorts list by frequency
freq_lst = sorted(freq_lst, key=lambda item: item[0])

print freq_lst

如果您无法使用collections.Counter(),可以使用以下功能替换它:

def my_counter(list_of_strings):    
    dct = {}

    for string in list_of_strings:
        if string not in dct:
            dct.update({string: 1})
        else:
            dct[string] += 1

    return dct

答案 1 :(得分:1)

尝试这样:使用计数器

import nltk
from collections import Counter
file = open("C:/Python26/rzlt.txt");
contents = file.read();
tokens = nltk.word_tokenize(contents);
words = map(str.isalnum, tokens)
frequency = Counter(words)

for x, y in sorted(frequency.items(), key=lambda x:x[1]):
    print x, y

答案 2 :(得分:1)

Try This, I had used collections for getting the count of the each word,
and for displaying it in ascending ordered i used sorted with parameter
reverse=True

import collections ## import the collection module
file = open("filename.txt") ## open the file which need to be sorted
list = [] ## Create the empty list
print "sorted data : "
print "==============================================="
for data in file: ## Iterate the data file
    list.append(data.strip())
print "\n".join(sorted(list)) ## Print each read line on next line
count = collections.Counter(list) ## Get the count of the each word
print "==============================================="
print "Count of each word is:"
for data in sorted(count, reverse=True): ## Iterate the file in ascending order
    print '%s : %d' % (data, count[data]) ## Print the read file in ascending order