Question

我有一个姓名字典和姓名出现在电话簿中的次数：

names_dict = {
    'Adam': 100,
    'Anne': 400,
    'Britney': 321,
    'George': 645,
    'Joe': 200,
    'John': 1010,
    'Mike': 500,
    'Paul': 325,
    'Sarah': 150
}

最好不使用sorted()，我想遍历字典并创建一个只有前五个名字的新词典：

def sort_top_list():
  # create dict of any 5 names first
  new_dict = {}
  for i in names_dict.keys()[:5]:
    new_dict[i] = names_dict[i]:

  # Find smallest current value in new_dict
  # and compare to others in names_dict
  # to find bigger ones; replace smaller name in new_dict with bigger name
  for k,v in address_dict.iteritems():
    current_smallest = min(new_dict.itervalues())
    if v > current_smallest:
      # Found a bigger value; replace smaller key/ value in new_dict with larger key/ value
      new_dict[k] = v
      # ?? delete old key/ value pair from new_dict somehow

我似乎能够创建一个新的字典，每当我们遍历names_dict并找到一个高于new_dict的名称/计数时，它就会得到一个新的键/值对。但是，在我们从names_dict中添加较大的一个之后，我怎么能弄清楚如何从new_dict中删除较小的那些。

有没有更好的方法 - 无需导入特殊库或使用sorted() - 迭代dict并创建具有最高值的前N个键的新dict？

Answer 1

您应该使用heapq.nlargest() function来实现这一目标：

import heapq
from operator import itemgetter

top_names = dict(heapq.nlargest(5, names_dict.items(), key=itemgetter(1)))

这使用更有效的算法（O（NlogK）用于大小为N的字典，以及K个顶级项目）将前5个项目提取为(key, value)个元组，然后传递给dict()创建一个新词典。

演示：

>>> import heapq
>>> from operator import itemgetter
>>> names_dict = {'Adam': 100, 'Anne': 400, 'Britney': 321, 'George': 645, 'Joe': 200, 'John': 1010, 'Mike': 500, 'Paul': 325, 'Sarah': 150}
>>> dict(heapq.nlargest(5, names_dict.items(), key=itemgetter(1)))
{'John': 1010, 'George': 645, 'Mike': 500, 'Anne': 400, 'Paul': 325}

您可能希望改用collections.Counter() class。 Counter.most_common() method会使您的用例无关紧要。该方法的实现使用了heapq.nlargest()。

这些是不是特殊库，它们是Python标准库的一部分。否则你必须自己实现binary heap来实现这一点。除非你专门研究这个算法，否则重新实现你自己的算法没什么意义，Python implementation高度优化，extension written in C用于某些关键函数。

Answer 2

我不知道，为什么你不想使用排序，解决方案并不完美，甚至不能完全匹配你的问题，但我希望它可以激励你找到自己的实现。我认为这只是你遇到的真正问题的一个简短例子。

但正如你在其他答案中看到的那样：通常最好使用代码，而不是自己做所有事情。

names_dict = {'Joe' : 200, 'Anne': 400, 'Mike': 500, 'John': 1010, 'Sarah': 150, 'Paul': 325, 'George' : 645, 'Adam' : 100, 'Britney': 321}

def extract_top_n(dictionary, count):
    #first step: Find the topmost values
    highest_values = []
    for k,v in dictionary.iteritems():
        print k,v, highest_values, len(highest_values)
        highest_values.append(v)
        l = len(highest_values)
        for i in range(l-1):
            print i,l
            if l-i < 1:
                break
            if highest_values[l-i-1]>highest_values[l-i-2]:
                temp = highest_values[l-i-2]
                highest_values[l-i-2] = highest_values[l-i-1]
                highest_values[l-i-1] = temp
        highest_values = highest_values [:count]

    #fill the dirctionary with all entries at least as big as the smallest of the biggest
    #but pay attention: If there are more than 2 occurances of one of the top N there will be more than N entries in the dictionary
    last_interesting = highest_values[len(highest_values)-1]
    return_dictionary = {}    
    for k,v in dictionary.iteritems():
        if v >= last_interesting:
            return_dictionary[k] = v
    return return_dictionary

print extract_top_n(names_dict,3)

来自字典（Python）的前n个值（和键）的新词典

2 个答案: