我正在尝试使用以下代码打印前10个常用词。但是,它不起作用。有关如何修复的想法吗?
def reducer_count_words(self, word, counts):
# send all (num_occurrences, word) pairs to the same reducer.
# num_occurrences is so we can easily use Python's max() function.
yield None, (sum(counts), word)
# discard the key; it is just None
def reducer_find_max_10_words(self, _, word_count_pairs):
# each item of word_count_pairs is (count, word),
# so yielding one results in key=counts, value=word
tmp = sorted(word_count_pairs)[0:10]
yield tmp
答案 0 :(得分:2)
使用collections.Counter
及其most_common
方法:
>>>from collections import Counter
>>>my_words = 'a a foo bar foo'
>>>Counter(my_words.split()).most_common()
[('foo', 2), ('a', 2), ('b', 1)]
答案 1 :(得分:1)
示例:
most_common([n])
Return a list of the n most common elements and their counts from the most common to the least. If n is not specified, most_common() returns all elements in the counter. Elements with equal counts are ordered arbitrarily:
>>> from collections import Counter
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]
答案 2 :(得分:0)
tmp = sorted(word_count_pairs, key=lambda pair: pair[0], reverse=True)[0:10]
<强>解释强>
key
sorted()
参数允许您在比较之前对每个元素运行一个函数。lambda pair: pair[0]
是一个从word_count_pairs中提取数字的函数。reverse
按降序排序,而不是按升序排序。来源:
除了:如果你有很多不同的单词,排序整个列表以找到前十名是低效的。有更高效的算法。另一个答案中提到的most_common()
方法可能会使用更有效的算法。