我遇到以下问题:
问题:
在Python中实现一个函数count_words(),它将字符串s和数字n作为输入,并返回s中最常出现的n个单词。返回值应该是元组列表 - 前n个单词与它们各自的计数[(,),...,...]配对,按递减计数顺序排序。
您可以假设所有输入都是小写的,并且没有标点或其他字符(仅字母和单个分隔空格)。如果是平局(相等计数),请按字母顺序排列绑定的单词。
E.g:
打印count_words(“贝蒂买了一点黄油,但黄油很苦”,3) 输出:[('butter',2),('a',1),('betty',1)]
这是我的解决方案:
"""Count words."""
from operator import itemgetter
from collections import Counter
def count_words(s, n):
"""Return the n most frequently occuring words in s."""
# TODO: Count the number of occurences of each word in s
words = s.split(" ");
words = Counter(words)
# TODO: Sort the occurences in descending order (alphabetically in case of ties)
print(words)
# TODO: Return the top n words as a list of tuples (<word>, <count>)
top_n = words.most_common(n)
return top_n
def test_run()
"""Test count_words() with some inputs."""
print(count_words("cat bat mat cat bat cat", 3))
print(count_words("betty bought a bit of butter but the butter was bitter", 3))
if __name__ == '__main__':
test_run()
问题是具有相同计数的元素是任意排序的,我如何按字母顺序排列元素?
答案 0 :(得分:3)
您可以使用出现次数(按相反顺序)和字典顺序对它们进行排序:
>>> lst = [('meat', 2), ('butter', 2), ('a', 1), ('betty', 1)]
>>>
>>> sorted(lst, key=lambda x: (-x[1], x[0]))
# ^ reverse order
[('butter', 2), ('meat', 2), ('a', 1), ('betty', 1)]
出现次数优先于lex。顺序。
在您的情况下,使用words.items()
代替我使用的列表列表。您将不再需要使用most_common
,因为sorted
已经做同样的事情。
答案 1 :(得分:0)
python函数sorted
是stable,这意味着在绑定的情况下,绑定的项目将按相同的顺序排列。因此,您可以先对字符串进行排序,以便按顺序排序:
alphabetical_sort = sorted(words.items(), key=lambda x: x[0])
然后是计数:
final_sort = sorted(alphabetical_sort, key=lambda x: x[1], reverse=True)
编辑:没有看到摩西的更好答案。当然,越少越好。
答案 2 :(得分:0)
这是概念化问题的另一种方式:
def count_words(s,n):
words = s.split(" ")
# TODO: Count the number of occurences of each word in s
counters = {}
for word in words:
if word in counters:
counters[word] += 1
else:
counters[word] = 1
# TODO: Sort the occurences in descending order (alphabetically in case of ties)
top = sorted(counters.iteritems(), key=lambda d:(-d[1],d[0]))
# TODO: Return the top n words as a list of tuples (<word>, <count>)
top_n = top[:n]
return top_n
def test_run():
print count_words("cat bat mat cat bat cat", 3)
print count_words("betty bought a bit of butter but the butter was bitter", 3)
如果名称 =='主要': test_run()