Question

我遇到以下问题：

问题：

在Python中实现一个函数count_words（），它将字符串s和数字n作为输入，并返回s中最常出现的n个单词。返回值应该是元组列表 - 前n个单词与它们各自的计数[（，），...，...]配对，按递减计数顺序排序。

您可以假设所有输入都是小写的，并且没有标点或其他字符（仅字母和单个分隔空格）。如果是平局（相等计数），请按字母顺序排列绑定的单词。

E.g：

打印count_words（“贝蒂买了一点黄油，但黄油很苦”，3）输出：

[（'butter'，2），（'a'，1），（'betty'，1）]

这是我的解决方案：

    """Count words."""

from operator import itemgetter
from collections import Counter

def count_words(s, n):
    """Return the n most frequently occuring words in s."""

    # TODO: Count the number of occurences of each word in s
    words = s.split(" ");
    words = Counter(words)
    # TODO: Sort the occurences in descending order (alphabetically in case of ties)
    print(words)
    # TODO: Return the top n words as a list of tuples (<word>, <count>)
    top_n = words.most_common(n)
    return top_n

def test_run()

    """Test count_words() with some inputs."""
    print(count_words("cat bat mat cat bat cat", 3))
    print(count_words("betty bought a bit of butter but the butter was bitter", 3))


if __name__ == '__main__':
    test_run()

问题是具有相同计数的元素是任意排序的，我如何按字母顺序排列元素？

Answer 1

您可以使用出现次数（按相反顺序）和字典顺序对它们进行排序：

>>> lst = [('meat', 2), ('butter', 2), ('a', 1), ('betty', 1)]
>>> 
>>> sorted(lst, key=lambda x: (-x[1], x[0]))
#                              ^ reverse order 
[('butter', 2), ('meat', 2), ('a', 1), ('betty', 1)]

出现次数优先于lex。顺序。

在您的情况下，使用words.items()代替我使用的列表列表。您将不再需要使用most_common，因为sorted已经做同样的事情。

Answer 2

python函数sorted是stable，这意味着在绑定的情况下，绑定的项目将按相同的顺序排列。因此，您可以先对字符串进行排序，以便按顺序排序：

alphabetical_sort = sorted(words.items(), key=lambda x: x[0])

然后是计数：

final_sort = sorted(alphabetical_sort, key=lambda x: x[1], reverse=True)

编辑：没有看到摩西的更好答案。当然，越少越好。

Answer 3

这是概念化问题的另一种方式：

def count_words（s，n）：

words = s.split(" ")
# TODO: Count the number of occurences of each word in s
counters = {}
for word in words:
    if word in counters:
        counters[word] += 1
    else:
        counters[word] = 1
# TODO: Sort the occurences in descending order (alphabetically in case of ties)
top = sorted(counters.iteritems(), key=lambda d:(-d[1],d[0]))

# TODO: Return the top n words as a list of tuples (<word>, <count>)
top_n = top[:n]
return top_n

def test_run（）：

print count_words("cat bat mat cat bat cat", 3)
print count_words("betty bought a bit of butter but the butter was bitter", 3)

如果名称 =='主要'： test_run（）

最常出现在字符串中的n个单词

3 个答案: