Question

我正在进行蒙特卡罗模拟。作为此任务的一部分，我生成均匀分布在(0,100)区间的样本。

generate = lambda: uniform(0,100)

当所有最近生成的点对都符合标准时，迭代停止。

check = lambda a,b: True if (b-a)<5 else False

我需要有一些结构来有效地保留所有生成的点，并能够按升序通过它们来对所有后续对执行check。

Python中有一个heapq模块，它支持非常有效的堆结构。我决定使用它。

我遇到了问题。我发现此模块不支持遍历过程。我发现以升序访问堆值的唯一方法是使用heapq.heappop。但是它会删除堆中的值。

我找到了解决方法，只是将堆对象复制到新对象中，然后使用heappop对新对象进行迭代。但我不认为每次迭代都将整个结构复制到内存中是非常有效的。

我还有其他方法可以做更有效的工作吗？

简化的插图代码。

import heapq
from random import uniform
from itertools import tee, izip, count
from copy import copy


def pairwise(iterable): #get values from iterator in pairs
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)


check = lambda a,b: True if (b-a)<5 else False
generate = lambda: uniform(0,100)


def iterate_heap(heap):
    heap = copy(heap) #Here I have to copy the heap to be able to traverse
    try:
        while True:
            yield heapq.heappop(heap)
    except IndexError:
        return


def trial():
    items = []

    for i in count():
        item = generate()
        heapq.heappush(items, item)

        it = iterate_heap(items)
        it = pairwise(it)

        if i>0 and all(check(a,b) for a,b in it): #if i==0 then 'it' returns no values and 'all' returns True
            return i

print "The solution is reached. It took %d iterations." % trial()

paiwise函数来自here的配方。

更新在使用heappop的此实现中，每次迭代的复杂度为O(n*log(n))：

复制堆：O(n)

向堆中添加新值：O(log(n))

遍历：n元素* O(log(n))从堆中弹出每个值 - ＆gt; O(n*log(n))。

结果：O(n+log(n)+n*log(n)) = O(n*log(n)

但我希望遍历为O(n)，因此产生的复杂性为O(n)。

顺便说一句，如果我们只使用排序列表，我们需要在每次添加时对列表进行排序，因此O(n*log(n))，但遍历将是n*O(1) -> O(n)。因此，由此产生的复杂性仍为O(n*log(n))。

我找到了解决方案。这是使用bisect模块。找到要添加的位置将是O(log(n))。添加到列表的是O(n)（因为实现了必须移动插入后的所有值）。遍历是O(n)。因此，最终的复杂性为O(n)。

但是，如果有办法用Python中的堆来解决这个问题，我还是有点想法。

Answer 1

我会在堆上使用 list.sort（）。这使得堆状态保持不变，并且可以直接迭代底层列表。

FWIW， list.sort 使用的Timsort算法将利用堆中已存在的部分排序。

Answer 2

来自python docs：

这两个可以将堆视为常规Python列表而不会出现意外：heap [0]是最小的项目，而heap.sort（）维护堆不变！

是否有理由不能将堆视为列表并迭代它？

Answer 3

我已经进行了一些效率计算。

使用bisect模块可以获得最佳性能：列表中间的10000个插入在我的计算机上以 0.037 秒计时（Python 2.7）。

使用blist模块的sortedlist时钟 0.287 秒进行相同数量的插入。

在每个list时钟 2.796 秒后，使用传统的sort并append。（现在Timsort算法在Python中使用，并且它被认为在几乎排序的列表上非常有效;但结果却不如使用bisect）那么高效。

我用来进行这些计算的代码：

import bisect
import timeit
import __main__
import blist

N = 10000 #Number of executions
L = 1000 #Length of initial list

def test_f_bisect(a):
    bisect.insort_right(a,500)


def test_f_list_sort(a):
    a.append(500)
    a.sort()


test_f_blist_init = '''
from __main__ import test_f_blist
import blist
a = blist.sortedlist(range({L}))
'''.format(L=L)
def test_f_blist(a):
    a.add(500)


names = dir(__main__)
for name in names:
    attr = getattr(__main__,name)
    if hasattr(attr,'__call__'):
        if name.startswith('test_f_'):
            init_name = name + '_init'
            if hasattr(__main__, init_name):
                init = getattr(__main__,init_name)
            else:
                init = 'from __main__ import {name}; a = list(range({L}))'.format(name=name, L=L)
            t = timeit.Timer(stmt='{name}(a)'.format(name=name),
                             setup=init)

            time = t.timeit(N)
            print('{name}: {time}'.format(name=name,time=time))

Answer 4

对于记录，在这种情况下，正确的数据结构是B树。有implementation：

 from blist import sortedlist

运行时复杂度最低：O（n * logn）构造列表，O（n）迭代。

Answer 5

我创建了一个 Iterator 类，它将执行最小堆的惰性有序遍历。它具有以下优点：

不需要原始堆的副本
不修改原始堆
如果提前停止，延迟迭代会更有效

为了跟踪迭代的下一个项目，我实际上只是使用了另一个堆 self.next_items。

import heapq

class HeapIter:

    def __init__(self, heap):
        self.original_heap = heap
        self.next_items = []
        if len(self.original_heap) > 0:
            self.next_items.append((self.original_heap[0], 0))

    def current_element(self):
        if len(self.next_items) == 0:
            return None
        return self.next_items[0][0]

    def next(self):
        if len(self.next_items) == 0:
            return None
        next_elem, next_index = heapq.heappop(self.next_items)
        child_1 = 2 * next_index + 1
        child_2 = child_1 + 1
        if child_1 < len(self.original_heap):
            heapq.heappush(self.next_items, (self.original_heap[child_1], child_1))
        if child_2 < len(self.original_heap):
            heapq.heappush(self.next_items, (self.original_heap[child_2], child_2))
        return next_elem

遍历堆化列表

5 个答案: