Question

我正在尝试在python中编写一个计数排序，以在某些情况下击败内置的timsort。现在它击败了内置的排序函数，但仅适用于非常大的数组（长度为100万个整数，更长，我没有尝试超过1000万个），并且仅适用于不大于10,000的范围。此外，胜利是狭隘的，计数排序仅在专门为其量身定制的随机列表中获得了显着的优势。

我已经读过可以从矢量化python代码中获得惊人的性能提升，但我并不特别了解如何使用它或如何在这里使用它。我想知道如何对此代码进行矢量化以加快速度，并欢迎任何其他性能建议。

目前最快的版本只有python和stdlibs：

from itertools import chain, repeat

def untimed_countsort(unsorted_list):
    counts = {}
    for num in unsorted_list:
        try:
            counts[num] += 1
        except KeyError:
            counts[num] = 1

    sorted_list = list(
        chain.from_iterable(
            repeat(num, counts[num])
            for num in xrange(min(counts), max(counts) + 1)))
    return sorted_list

所有重要的是这里的原始速度，所以牺牲更多空间来提高速度是完全公平的游戏。
我已经意识到代码已经相当简短明了，所以我不知道有多少空间可以提高速度。
如果有人对代码进行了更改以缩短代码，只要它不会变慢，那也会很棒。
执行时间下降了近80％！现在测试速度是Timsort的三倍！

通过LONG镜头执行此操作的绝对最快的方法是使用这个带有numpy的单行：

def np_sort(unsorted_np_array):
    return numpy.repeat(numpy.arange(1+unsorted_np_array.max()), numpy.bincount(unsorted_np_array))

这比纯python版本快10-15倍，比Timsort快约40倍。它需要一个numpy数组并输出一个numpy数组。

Answer 1

使用numpy，此功能会缩减为以下内容：

def countsort(unsorted):
    unsorted = numpy.asarray(unsorted)
    return numpy.repeat(numpy.arange(1+unsorted.max()), numpy.bincount(unsorted))

当我从区间[0,10000）以100000随机整数尝试它时，运行速度提高了大约40倍。 bincount进行计数，repeat从计数转换为排序数组。

Answer 2

不考虑你的算法，这将有助于摆脱大多数纯python循环（速度很慢）并将它们转化为理解或生成器（总是比常规for块更快）。此外，如果您必须创建一个包含所有相同元素的列表，[x]*n语法可能是最快的方法。 sum用于展平列表列表。

from collections import defaultdict

def countsort(unsorted_list):
    lmin, lmax = min(unsorted_list), max(unsorted_list) + 1
    counts = defaultdict(int)
    for j in unsorted_list:
        counts[j] += 1
    return sum([[num]*counts[num] for num in xrange(lmin, lmax) if num in counts])

请注意，这不是矢量化的，也不使用numpy。

我怎样才能对这个python计数排序进行矢量化，以便它尽可能快？

2 个答案: