另一种方法，更多pythonic ..！

Question

我是python的新手，正在尝试对列表进行排序并获得3个最常用的单词。我到目前为止：

from collections import Counter

reader = open("longtext.txt",'r')
data = reader.read()
reader.close()
words = data.split() # Into a list
uniqe = sorted(set(words)) # Remove duplicate words and sort
for word in uniqe:
        print '%s: %s' %(word, words.count(word) ) # words.count counts the words.

这是我的输出，如何排序最常用的单词并仅列出第一，第二和第三个常用单词？：

2: 2
3.: 1
3?: 1
New: 1
Python: 5
Read: 1
and: 1
between: 1
choosing: 1
or: 2
to: 1

Answer 1

您可以使用collections.counter's most_common方法，就像这样

from collections import Counter
with open("longtext.txt", "r") as reader:
    c = Counter(line.rstrip() for line in reader)
print c.most_common(3)

从官方文档中引用示例

>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]

如果你想像问题中所示那样打印它们，你可以简单地迭代最常见的元素并像这样打印它们

for word, count in c.most_common(3):
    print "{}: {}".format(word, count)

注意： Counter方法比排序方法更好，因为Counter的运行时将在O（N）中，而排序需要O（N * log N）在最坏的情况下。

Answer 2

除了作为替代方式的pythonic方式most_common之外，您还可以使用sorted：

>>> d={'2': 2,'3.': 1,'3?': 1,'New': 1,'Python': 5,'Read': 1,'and': 1,'between': 1,'choosing': 1,'or': 2,'to': 1} 
>>> print sorted(d.items(),key=lambda x :x[1])[-3:]

>>> [('2', 2), ('or', 2), ('Python', 5)]

或使用heapq.nlargest。但请注意，如果您要查找的项目数量相对较少，nlargest()函数最合适。：

import heapq
print heapq.nlargest(3, d.items(),key=lambda x :x[1])
[('Python', 5), ('2', 2), ('or', 2)]

Answer 3

另一种方法，更多pythonic ..！

这是另一种不使用计数器或计数方法的方法。希望这会带来更多想法。

#reader = open("longtext.txt",'r')
#data = reader.read()
#reader.close()
data  = 'aa sfds fsd f sd aa dfdsa dfdsa dfdsa sd sd sds ds dsd sdds sds sd sd sd sd sds sd sds'
words = data.split()
word_dic = {}
for word in words:
    try:
        word_dic[word] = word_dic[word]+1
    except KeyError:
        word_dic[word] = 1
print  sorted([(value, key) for (key,value) in word_dic.items()])[-3:]

对列表进行排序并获得最常用的单词

3 个答案:

另一种方法，更多pythonic ..！