Question

我一直在实现自己的堆模块，以帮助我理解堆数据结构。我理解它们是如何工作和管理的，但我的实现比标准python heapq模块慢得多，同时执行堆排序（对于大小为100,000的列表，heapq需要0.6s而我的代码需要2s（原来是2.6s，切断它）通过从percDown中取出len（）语句并传递长度来减少到2s，因此每次方法调用自身时都不必计算len。这是我的实现：

def percDown(lst, start, end, node):
    #Moves given node down the heap, starting at index start, until the heap property is
    #satisfied (all children must be larger than their parent)
    iChild = 2 * start + 1
    i = start
    # if the node has reached the end of the heap (i.e. no children left),
    # return its index (we are done)
    if iChild > end - 1:
        return start
    #if the second child exists and is smaller than the first child, use that child index
    #for comparing later
    if iChild + 1 < end and lst[iChild + 1] < lst[iChild]:
        iChild += 1
    #if the smallest child is less than the node, it is the new parent
    if lst[iChild] < node:
        #move the child to the parent position
        lst[start] = lst[iChild]
        #continue recursively going through the child nodes of the
        # new parent node to find where node is meant to go
        i = percDown(lst, iChild, end, node)
    return i

popMin：弹出最小值（lst [0]）并重新排序堆

def popMin(lst):
    length = len(lst)
    if (length > 1):
        min = lst[0]
        ele = lst.pop()
        i = percDown(lst, 0, length - 1, ele)
        lst[i] = ele
        return min
    else:
        return lst.pop()

heapify：将列表转换为就地堆

def heapify(lst):
    iLastParent = math.floor((len(lst) - 1) / 2)
    length = len(lst)
    while iLastParent >= 0:
        ele = lst[iLastParent]
        i = percDown(lst, iLastParent, length, lst[iLastParent])
        lst[i] = ele
        iLastParent -= 1

sort：使用上述方法（非就地）对给定列表进行排序

def sort(lst):
    result = []
    heap.heapify(lst)
    length = len(lst)
    for z in range(0, length):
        result.append(heap.popMin(lst))
    return result

我是否错误地增加了算法/堆创建的复杂性，还是只是python heapq模块被大量优化？我感觉它是前者，因为0.6s vs 2s是一个巨大的差异。

Answer 1

Python heapq模块使用C扩展。你无法击败C代码。

来自heapq module source code：

# If available, use C implementation
try:
    from _heapq import *
except ImportError:
    pass

另见_heapqmodule.c C source。

Answer 2

0.6s与2.6s相差不到4倍。那是＆＃34;太大＆＃34;？

没有足够的信息可以回答。如果算法错误，可能会导致4x不同的 ......但如果不进行不同尺寸的测试，就无法判断。

例如，如果你得到1000倍的1.2倍差异，100000的4倍差异和1000000的12倍差异，那么你的算法复杂度很可能更糟，这意味着你可能做了弄错了，这是你需要解决的问题。

另一方面，如果它在所有三种尺寸上的差异大约是4倍，那么你的开销中只有一个更大的常数乘数。很可能是因为你有一个在Python中运行的内部循环，而（CPython）stdlib版本正在使用_heapq加速器模块在C中执行相同的循环，如Martijn Pieters' answer中所述。所以，你没有弄错。您可以稍微进行微优化，但最终您将不得不在C中重写代码的核心，或者在JIT优化的解释器中运行它以获得与stdlib一样好的地方。实际上，如果您只是为了理解算法而写这篇文章，那么您就不需要这样做了。

作为旁注，您可能想尝试在PyPy中运行比较。它的大部分stdlib都是用纯Python编写的，没有特别的优化，但优化的JIT编译器使它几乎和CPython中的本机C代码一样快。同样的JIT将应用于您的代码，这意味着您的未经优化的代码通常几乎与CPython中的本机C代码一样。当然，并不能保证这一点，如果您尝试测试算法的复杂性，它并不会改变总是需要以不同大小进行测试的事实

为什么二进制堆的实现比Python的stdlib慢？

2 个答案: