Question

我有一个txt文件，我想从中计算每个单词的频率，之后我想对列表进行排序，排序后我想要按照它们的降序打印频率相关词。我写了python代码，但我不知道如何做到这一点。代码是

frequency = []
file = open("C:/Python26/rzlt.txt");
contents=file.read();
tokens = nltk.word_tokenize(contents);
f=open("frequencies.txt",'w')
f2=open("count.txt",'w')
for t in tokens:
    freq = str(tokens.count(t))
    frequency.append(freq)
    f.write(t+"\t"+freq)
frequency.sort(reverse=True)
for t in tokens:
    f2.write(t+"\t"+ frequency(t))
f.close()
f2.close()

Answer 1

with open() as .. :会自动关闭文件。 collections.Counter()计算列表中的所有字词。

最后sorted()按递减值顺序对Counter()对象进行排序。

import collections

with open('my_text_file.txt', 'r') as f:

    f_as_lst = f.read().split()
    c = collections.Counter(f_as_lst)

# Creates a list of tuples with values and keys swapped
freq_lst = [(v, k) for k, v in c.items()]
# Sorts list by frequency
freq_lst = sorted(freq_lst, key=lambda item: item[0])

print freq_lst

如果您无法使用collections.Counter()，可以使用以下功能替换它：

def my_counter(list_of_strings):    
    dct = {}

    for string in list_of_strings:
        if string not in dct:
            dct.update({string: 1})
        else:
            dct[string] += 1

    return dct

Answer 2

尝试这样：使用计数器

import nltk
from collections import Counter
file = open("C:/Python26/rzlt.txt");
contents = file.read();
tokens = nltk.word_tokenize(contents);
words = map(str.isalnum, tokens)
frequency = Counter(words)

for x, y in sorted(frequency.items(), key=lambda x:x[1]):
    print x, y

Answer 3

Try This, I had used collections for getting the count of the each word,
and for displaying it in ascending ordered i used sorted with parameter
reverse=True  

import collections      ## import the collection module 
file = open("filename.txt")     ## open the file which need to be sorted
list = []                       ## Create the empty list
print "sorted data : "          
print "==============================================="
for data in file:                ## Iterate the data file 
    list.append(data.strip())
print "\n".join(sorted(list)) ## Print each read line on next line
count = collections.Counter(list)    ## Get the count of the each word 
print "==============================================="
print "Count of each word is:"
for data in sorted(count, reverse=True):        ## Iterate the file in ascending order
    print '%s : %d' % (data, count[data])    ## Print the read file in ascending order

如何在python中排序后查找单词的频率

3 个答案: