文件中最少的常用词

时间:2015-12-07 23:30:31

标签: python collections

我有兴趣在文件中找到最不常见的文本。

from collections import Counter

# Load the file and extract the words
lines = open("mobydick.txt").readlines()
words = [ word for l in lines for word in l.rstrip().split() ]
print 'No of words in the file:', len(words)

# Use counter to get the counts
counts = Counter( words )

print 'Least common words:'
for word, count in sorted(counts.most_common()[:-3], key=lambda (word, count): (count, word), reverse=True):
    print '%s %s' % (word, count)

如何限制3个单词。它打印出一堆。

3 个答案:

答案 0 :(得分:5)

你正在以错误的方式对列表进行切片。感受差异

print [1,2,3,4,5][:-3]
[1, 2]
print [1,2,3,4,5][-3:]
[3, 4, 5]

答案 1 :(得分:2)

var publickey = openpgp.key.readArmored(myPublicKey);
//var keyID = openpgp.packet.PublicKey(publickey).getKeyId()[0].toHex();
var keyID = openpgp.PublicKey(publickey).getKeyId()[0].toHex();
console.log(keyID);

答案 2 :(得分:2)

只需移动 <h3 > <span>header</span> <span ng-click="addClientClick()" class="glyphicon glyphicon-plus pull-right btn" title="Добавить клиента"></span> <span class="pull-right"> <div class="input-group"> <span title="Rows per page" class="input-group-addon"> <i class="glyphicon glyphicon-th-list"></i> </span> <select title="Rows per page" class="form-control " ng-model="pgtr.rowsPerPage.rows" ng-change="pgtr.rowsPerPageChange()" ng-options="rpp for rpp in pgtr.rowsPerPageVariants " > </select> </div> </span> <span class="pull-right"> <ul class="pagination " style="padding: 0 1em 0 1em; margin: 0; "> <li > <a href="#" ng-click="pgtr.click('first')">1</a> </li> <li ng-hide="pgtr.buttons.prev.hide"> <a href="#" ng-click="pgtr.click('prev');">2</a> </li> <li ng-hide="pgtr.buttons.curr.hide" class="active"> <a href="#">3</a> </li> </ul> </span> </h3>

即可
:

并且正如@Joran所评论的那样,您不需要对for word, count in counts.most_common()[-3:] print '%s %s' % (word, count) 的结果进行排序,因为它已经被订购了。