Question

我需要根据下面的频率计数对单词进行排序。

在清理停用词后拆分单词：

words=Counter([item for sublist in m.split('\W+') for item in word_tokenize(sublist)])

频率计数：

wordsFreq=['%s: %d' %(x, words[x]) for x in words]

输出：

["limited: 1", "desirable: 1", "advices: 1","new: 8", "net: 5", "increasing: 2",......]

print type(wordsFreq)

输出

<type 'list'>

Answer 1

一种方法是将数据转换为字典，字词为键，频率为值：

import operator

in_lst = ["limited: 1", "desirable: 1", "advices: 1",
             "new: 8", "net: 5", "increasing: 2"]

freq_dict = {x[0]: x[1] for x in [i.split(": ") for i in in_lst]}

sorted_lst = sorted(freq_dict.items(), key=operator.itemgetter(1))

out_lst = [": ".join(i) for i in sorted_lst]

然后，该程序根据字典中的值对项目进行排序。 sorted_lst是一个元组列表，然后转换为原始字符串列表，按其频率按递增顺序排序。

另一种解决方案是使用OrderedDict模块中的collections。

基于价值的排序

1 个答案: