Question

我正在研究一个运行几个小时的python项目，然后才能完成所有计算。我希望随着计算的进展保持计算的前十位。

有明显的做法：

if calc > highest_calc:
    second_calc = highest_calc
    highest_calc = calc
if calc < highest_calc and calc > second_calc:
    third_calc =  second_calc
    second_calc = calc
if calc < second_calc and calc > third_calc:
    fourth_calc = third_calc
    third_calc = calc
etc.

但是有更好的，更有活力的和pythonic的方式吗？

加成

对于我的项目，每个计算都有三个相应的名称：name_a name_b name_c。我不想要的就是拥有相同三个名字的十大价值之一。但是，如果最后一个calc具有相同的名称，我想保持两者中最高的一个。最好的方法是什么？

例如，假设2.3是calc的值，使用MCD SBUX和CAT来计算calc。但是，如果我使用calc MCD和SBUX已经使CAT成为最高位置，那该怎么办呢？如何找到此calc的值，以便我可以看到它是否小于或大于新calc。如果大于，则删除旧的calc并添加新的calc。如果小于pass新计算。希望这是有道理的：

If name_a in top10 and name_b in top10 and name_c in top10:
   if calc > old_calc_with_same_names:
       add_calc = calc, name_a, name_b, name_c
       top10.insert(bisect.bisect(calc, top10[0]), add_calc)
else:
   add to top10

完成代码

csc = []
top_ports = []
add_sharpe = [sharpe, name_a, weight_a, exchange_a, name_b, weight_b, exchange_b, name_c, weight_c, exchange_c]
    if init__calc == 0:
            csc.append(add_sharpe)
    if init__calc > 1:
        if name_a == prev_name_a and name_b == prev_name_b and name_c == prev_name_c:
            csc.append(add_sharpe)
        if name_a != prev_name_a or name_b != prev_name_b or name_c != prev_name_c:
            if csc:
                hs = max(csc, key=lambda x: x[0])
                if top_ports:
                    ls = min(top_ports, key=lambda x: x[0])
                    if hs[0] > ls[0]:
                        hsi = csc.index(hs)
                        top_ports.append(csc[hsi])
                else:
                    hsi = csc.index(hs)
                    top_ports.append(csc[hsi])
            csc = []
            csc.append(add_sharpe)

稍后在剧本中......

top_ports = sorted(top_ports, key=itemgetter(0), reverse=True)
print "The highest sharpe is: {0}".format(top_ports[0])
print " ==============================================="
print " ==============================================="
print datetime.now() - startTime
print "Second: {0}".format(top_ports[1])
print "Third: {0}".format(top_ports[2])
print "Fourth: {0}".format(top_ports[3])
print "Fifth: {0}".format(top_ports[4])

等

Answer 1

将所有得分存储在列表中的最简单方法，然后将其排序（最高位）并取前10位。

import random
# sample random scores
scores = [int(1000*random.random()) for x in xrange(100)]

# uncomment if scores must be unique
#scores = set(scores)
topten = sorted(scores, reverse=True)[:10]

print topten

如果您需要阻止列表中的重复分数，请使用一组。

这是＆＃39; vanilla＆＃39;获得前10个分数的方法，但它错过了优化的机会，这将对更大的数据集产生影响。

即如果在添加分数时维持前十名分数列表，则每次要求前10名时，不需要对整个列表进行分类。为此，也许可以保留2个清单;完整列表和前10名，后来@thijs van Dien提出的heapq方法更胜一筹。

Answer 2

使用heapq模块。它不是不必要地存储所有结果，而是在每一步添加新结果然后有效地移除最低 - 可能是刚添加的那个 - 有效地保持前10名。存储所有结果不一定是坏的;收集统计数据很有价值，并且可以更容易地确定之后要保留的内容。

from heapq import heappush, heappushpop

heap = []
for x in [18, 85, 36, 57, 2, 45, 55, 1, 28, 73, 95, 38, 89, 15, 7, 61]:
    calculation_result = x + 1 # Dummy calculation
    if len(heap) < 10:
        heappush(heap, calculation_result)
    else:
        heappushpop(heap, calculation_result)

top10 = sorted(heap, reverse=True) # [96, 90, 86, 74, 62, 58, 56, 46, 39, 37]

请注意，此模块具有更多有用的功能，仅请求最高/最低值等。这可以帮助您添加有关名称的行为。

实际上这个结构非常常见，因此它可以heapq.nlargest获得。但是，为了不存储所有结果，您必须将计算器建模为生成器，这样更先进。

from heapq import nlargest

def calculate_gen():
    for x in [18, 85, 36, 57, 2, 45, 55, 1, 28, 73, 95, 38, 89, 15, 7, 61]:
        yield x + 1 # Dummy calculation

top10 = nlargest(10, calculate_gen()) # [96, 90, 86, 74, 62, 58, 56, 46, 39, 37]

<强>加成

以下是对每个关联名称组合使结果唯一的一些想法。

使用堆不会再削减它了，因为堆不能很好地定位任何不是绝对最小值/最大值的项目，而我们感兴趣的是这里的某种局部最小值名称组合。

相反，您可以使用dict为每个名称组合保留最高值。首先，您需要将名称组合编码为不可变值，以使其作为键使用，并且因为名称的顺序无关紧要，所以决定一些顺序并坚持下去。我会使用字母字符串来保持简单。

在下面的代码中，每个结果都放在dict所在的名称组合唯一的位置 - 因此可能需要进行标准化 - 只要没有更好的结果。后来顶部 n 是根据每个组合的最高结果编译的。

from heapq import nlargest

calculations = [('ABC', 18), ('CDE', 85), ('BAC', 36), ('CDE', 57),
                ('ECD',  2), ('BAD', 45), ('EFG', 55), ('DCE',  1)]

highest_per_name_combi = dict()

for name_combi, value in calculations:
    normal_name_combi = ''.join(sorted(name_combi)) # Slow solution
    current = highest_per_name_combi.get(normal_name_combi, float('-inf'))
    highest_per_name_combi[normal_name_combi] = max(value, current)

top3 = nlargest(3, highest_per_name_combi.iteritems(), key=lambda x: x[1])

此方法的唯一问题可能是使用的内存量。由于150个名称可以有551300（150选3）组合，您可能不得不决定每隔一段时间清理dict，这很简单。在循环中，检查dict的大小，如果它超过某个（仍然很大）的数字，请组成当前的顶部 n 并从中创建一个新的最小dict它。此外，可以通过减少查找/调用的数量来应用一些微优化，例如，不使用get和/或max。

如果您可以控制执行计算的顺序，那么所有这些都会容易得多。如果您知道接下来的1000个计算都是针对相同的名称组合，那么在将其添加到整体结果之前，您可以先找到最好的计算结果。

此外，通过真正大量的结果，最简单的方法实际上可能是最好的。只需将它们以方便的格式写入文件，然后对它们进行排序（首先按名称组合，然后按值反向排序），只对每个名称组合进行第一次出现（分组时很容易）并再次对结果进行排序，只需按值。

Answer 3

感谢评论，这是我改进的解决方案，使用构建topten列表的想法。使用heapq，如另一个答案所示，显然要好得多。此解决方案将具有N * 10的最坏情况运行时，并且使用堆将减少到N * log2（10）。如果一个人不想要前十名，那么这可能是显而易见的，但是例如前十万的价值。但更重要的是，使用heapq具有很高的可读性，可理解性和正确性优势。

data = [18, 85, 73, 36, 57, 2, 45, 55, 1, 28, 73, 95, 38, 89, 15, 7, 61]

# start off the topten list
# with a sentinel value to simplify the add loop.
sentinel = 12345   # the sentinel could be any value.
topten = [sentinel]

def add(newvalue):
    length = len(topten)

    # temporarily overwrite the sentinel with the new value
    topten[-1] = newvalue

    # find the right place in the topten for the new value
    # iterate over topten in reverse order, skipping the sentinel position
    for i in xrange(-2, -length-1, -1): # -2, -3, ..., -length
        if newvalue > topten[i]:
            topten[i+1] = topten[i]
            topten[i] = newvalue
        else:
            break

    # fix up the topten list.
    # if we haven't yet gathered all top ten, grow the list
    # else discard the last element of the list.
    if length < 11:
        topten.append(sentinel)
    else: # length >= 11 i.e. == 11
        topten[-1] = sentinel

for v in data: add(v)
print topten[:-1] # drop the sentinel

通过维护集合，应该可以根据名称添加唯一性。

作为参考，我的初步解决方案如下。如果计算总数小于10，则存在选择初始值和虚假条目的问题。

data = [18, 85, 73, 36, 57, 2, 45, 55, 1, 28, 73, 95, 38, 89, 15, 7, 61]

import sys
floor = -sys.maxint - 1  # won't work in Python 3, as there is no sys.maxint
                         # for float, use float('-inf')
topten = [floor] * 10

def add(newvalue):
    # iterate over topten in reverse order
    for i in xrange(-1, -11, -1): # -1, -2, ..., -10. 
        if newvalue > topten[i]:
            if i < -1:
                topten[i+1] = topten[i]
            topten[i] = newvalue
        else:
            break

for v in data: add(v)
print topten

Pythonic方式存储前10名结果

3 个答案: