Question

我有一个单词列表：

words = ['all', 'awesome', 'all', 'yeah', 'bye', 'all', 'yeah']

我想得到一个元组列表：

[(3, 'all'), (2, 'yeah'), (1, 'bye'), (1, 'awesome')]

每个元组都是......

(number_of_occurrences, word)

列表应按出现次数排序。

到目前为止我做了什么：

def popularWords(words):
    dic = {}
    for word in words:
        dic.setdefault(word, 0)
        dic[word] += 1
    wordsList = [(dic.get(w), w) for w in dic]
    wordsList.sort(reverse = True)
    return wordsList

问题是......

是Pythonic，优雅而高效吗？你能做得更好吗？提前谢谢。

Answer 1

您可以使用counter进行此操作。

import collections
words = ['all', 'awesome', 'all', 'yeah', 'bye', 'all', 'yeah']
counter = collections.Counter(words)
print(counter.most_common())
>>> [('all', 3), ('yeah', 2), ('bye', 1), ('awesome', 1)]

它为元组提供了反向列。

来自评论：collections.counter是＆gt; = 2.7,3.1。您可以使用the counter recipe作为较低版本。

Answer 2

您正在寻找defaultdict集合：

from collections import defaultdict

D = defaultdict(int)
for word in words:
    D[word] += 1

这会给你一个词典，其中键是单词，值是频率。要获得你的（频率，单词）元组：

tuples = [(freq, word) for word,freq in D.iteritems()]

如果使用Python 2.7 + / 3.1 +，您可以使用内置Counter类执行第一步：

from collections import Counter
D = Counter(words)

Answer 3

是Pythonic，优雅而高效吗？

对我来说很好......

你能做得更好吗？

“更好”？如果它是可以理解的，有效的，还不够吗？

也许看defaultdict使用它而不是setdefault。

查找列表中最受欢迎的单词

3 个答案: