我需要根据下面的频率计数对单词进行排序。
在清理停用词后拆分单词:
words=Counter([item for sublist in m.split('\W+') for item in word_tokenize(sublist)])
频率计数:
wordsFreq=['%s: %d' %(x, words[x]) for x in words]
输出:
["limited: 1", "desirable: 1", "advices: 1","new: 8", "net: 5", "increasing: 2",......]
print type(wordsFreq)
输出
<type 'list'>
答案 0 :(得分:0)
一种方法是将数据转换为字典,字词为键,频率为值:
import operator
in_lst = ["limited: 1", "desirable: 1", "advices: 1",
"new: 8", "net: 5", "increasing: 2"]
freq_dict = {x[0]: x[1] for x in [i.split(": ") for i in in_lst]}
sorted_lst = sorted(freq_dict.items(), key=operator.itemgetter(1))
out_lst = [": ".join(i) for i in sorted_lst]
然后,该程序根据字典中的值对项目进行排序。 sorted_lst
是一个元组列表,然后转换为原始字符串列表,按其频率按递增顺序排序。
另一种解决方案是使用OrderedDict
模块中的collections
。