Question

我正在尝试对文件中的所有单词进行排序，并返回前20个引用的单词。这是我的代码：

import sys 

filename = sys.argv[2]

def helper_function(filename):
  the_file = open(filename, 'r')
  words_count = {}
  lines_in_file = the_file.readlines()
  for line in lines_in_file:
    words_list = line.split()
    for word in words_list:
      if word in words_count:
        words_count[word.lower()] += 1
      else:
        words_count[word.lower()] = 1 
  return words_count


def print_words(filename):
  words_count = helper_function(filename)
  for w in sorted(words_count.keys()): print w, words_count[w]

def print_top(filename):
  words_count = helper_function(filename)
  for w in sorted(words_count.values()): print w

def main():
  if len(sys.argv) != 3:
    print 'usage: ./wordcount.py {--count | --topcount} file'
    sys.exit(1)

  option = sys.argv[1]
  filename = sys.argv[2]
  if option == '--count':
    print_words(filename)
  elif option == '--topcount':
    print_top(filename)
  else:
    print 'unknown option: ' + option
    sys.exit(1)

if __name__ == '__main__':
  main()

我定义print_top（）的方式返回了words_count字典的排序值，但我想打印如下：字：计数

您的建议非常有价值！

Answer 1

你很接近，只是根据值对dict项进行排序（这就是itemgetter正在做的事情）。

>>> word_count = {'The' : 2, 'quick' : 8, 'brown' : 4, 'fox' : 1 }
>>> from operator import itemgetter
>>> for word, count in reversed(sorted(word_count.iteritems(), key=itemgetter(1))):
...     print word, count
...
quick 8
brown 4
The 2
fox 1

修改

对于＆＃34;前20＆＃34;，我建议查看heapq

>>> import heapq
>>> heapq.nlargest(3, word_count.iteritems(), itemgetter(1))
[('quick', 8), ('brown', 4), ('The', 2)]

Answer 2

要获得表格中的输出＆＃34; Key：Value＆＃34;，在您的字典填满值和键后，使用函数返回如下：

def getAllKeyValuePairs():
    for key in sorted(dict_name):
        return key + ": "+ str(dict_name[key])

或特定键值对：

def getTheKeyValuePair(key):
    if (key in dict_name.keys()):
        return key + ": "+ str(dict_name[key])
    else:
        return "No such key (" + key + ") in the dictionary"

Python：按值，打印键和值排序字典

2 个答案:

修改