Question

是否有针对特定类型列表找到最佳排序算法的客观测试？我尝试过这样的测试，但我不确定它的稳健性。症结可能是;是否可以设计客观测试以推广最佳列表类型，或者这些决策是否需要经验证据？

问题

我正在尝试为特定类型的列表找到最佳排序算法。它们包含2-202个具有唯一整数的项目。我正在努力找到排序数百万这样的列表的最快方法。

当我注意到用于python的C sorted(unsorted)中的内置Tim排序仅略微快于我在Python中的天真测试simple_sort(unsorted_set, order)时，这个搜索就开始了。有趣的是，Python中的quick_sort并不总是比simple_sort更快：

>>> def simple_sort(unsorted_set, order):
...     sorted_list = []
...     for i in order:
...         if i in unsorted_set:
...             sorted_list.append(i)
...     return sorted_list
>>> unsorted = [1, 5, 2, 9]
>>> order = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> unsorted_set = {item for item in unsorted}
>>> print simple_sort(unsorted_set, order)
[1, 2, 5, 9]

在某些时候，我需要排序的算法将在C中重写，一旦我在C中足够熟悉就可以这样做。

我已经努力编写以下测试代码，并假设C中的simple_sort将对我的特定类型的列表执行Tim Sort。
我假设sorted(unsorted)是Tim Sort的C实现。

用于对排序列表的切片进行排序的结果

Counting Sort是最快的。
在我用Python测试的最快的算法中，Tim Sort in C是最慢的，包括我的天真解决方案simple_sort。
有趣的是，Winning算法（下面）聚为3组。
第一次测试错误地排序了预先排序的列表。我为未排序的列表添加了一个测试（下面）。
The Excel file

2-202个排序项目的最快算法 enter image description here 2-202个已排序项目的最慢算法

排序未排序列表的切片的结果

对于我的短于203件的清单，Tim排序C最快。
对于长度超过约475项的列表，simple_sort更快。
我为未排序的列表添加了此部分，因为我的第一个测试的就地gnome_sort预排序输入列表。

2-202个未分类商品的最快算法 enter image description here 2-1002未分类商品的最快算法 2-202个未排序项目的最慢算法 2-1002未分类商品的最慢算法

测试代码

我已经链接了the full sorting test code here，因为排序算法会过长。

# 'times' is how many repetitions each algorithm should run  
times = 100
# 'unique_items' is the number of non-redundant items in the unsorted list
unique_items = 1003
# 'order' is an ordered list
order = []
for number in range(unique_items):
    order.append(number)
# Generate the unsorted list
random_list = order[:]
random.shuffle(random_list)
# 'random_set' is used for simple_sort
random_simple = random_list[:]
random_set = {item for item in random_simple}

# A list of all sorted lists for each algorithm   
sorted_lists = [
    simple_sort(random_set, order),
    quick_sort(random_list[:]),
    merge_sort(random_list[:]),
    shell_sort(random_list[:]),
    bubble_sort(random_list[:]),
    heap_sort(random_list[:]),
    insertion_sort(random_list[:]),
    insertion_sort_bin(random_list[:]),
    circle_sort(random_list[:]),
    cocktail_sort(random_list[:]),
    counting_sort(random_list[:], 0, unique_items),
    cycle_sort(random_list[:]),
    gnome_sort(random_list[:]),
    pancake_sort(random_list[:]),
    patience_sort(random_list[:]),
    radix_sort(random_list[:], unique_items),
    selection_sort(random_list[:]),
    abstract_tree_sort(random_list[:], BinarySearchTree),
    sorted(random_list[:])
    ]

# A set of all sorted lists for each algorithm
sorted_set = {repr(item) for item in sorted_lists}
# If only one version of the sorted list exists, True is evaluated
print 'All algorithms sort identically', len(sorted_set) is 1

# Sort slices of an unsorted list and record the times in 'time_record'
time_record = defaultdict(list)
for length in range(2, unique_items, 10):
    unsorted = random_list[:length]
    # 'unsorted_set' is used for simple_sort
    simple_unsorted = unsorted[:]
    unsorted_set = {item for item in simple_unsorted}

    print '**********', length, '**********'    

    print 'simple'
    simple = timeit.timeit(lambda: simple_sort(unsorted_set, order), number=times)
    time_record['Simple Sort'].append(simple)    

    print 'quick'
    quick_unsorted = unsorted[:]
    quick = timeit.timeit(lambda: quick_sort(quick_unsorted), number=times)
    time_record['Quick Sort'].append(quick)

    print 'merge'
    merge_unsorted = unsorted[:]
    merged = timeit.timeit(lambda: merge_sort(merge_unsorted), number=times)
    time_record['Merge Sort'].append(merged)

    print 'shell'
    shell_unsorted = unsorted[:]
    shell = timeit.timeit(lambda: merge_sort(shell_unsorted), number=times)
    time_record['Shell Sort'].append(shell)

    print 'bubble'
    bubble_unsorted = unsorted[:]
    bubble = timeit.timeit(lambda: bubble_sort(bubble_unsorted), number=times)
    time_record['In Place Bubble Sort'].append(bubble)    

    print 'heap'
    heap_unsorted = unsorted[:]
    heap = timeit.timeit(lambda: heap_sort(heap_unsorted), number=times)
    time_record['In Place Heap Sort'].append(heap)

    print 'insertion'
    insertion_unsorted = unsorted[:]
    insertion = timeit.timeit(lambda: insertion_sort(insertion_unsorted), number=times)
    time_record['In Place Insertion Sort'].append(insertion)

    print 'insertion binary'
    insertion_bin_unsorted = unsorted[:]
    insertion_bin = timeit.timeit(lambda: insertion_sort_bin(insertion_bin_unsorted), number=times)
    time_record['In Place Insertion Sort Binary'].append(insertion_bin)

    print 'circle'
    circle_unsorted = unsorted[:]
    circle = timeit.timeit(lambda: circle_sort(circle_unsorted), number=times)
    time_record['In Place Circle Sort'].append(circle)

    print 'cocktail'
    cocktail_unsorted = unsorted[:]
    cocktail = timeit.timeit(lambda: cocktail_sort(cocktail_unsorted), number=times)   
    time_record['In Place Cocktail Sort'].append(cocktail)

    print 'counting'
    counting_unsorted = unsorted[:]
    counting = timeit.timeit(lambda: counting_sort(counting_unsorted, 0, length), number=times)
    time_record['Counting Sort'].append(counting)

    print 'cycle'
    cycle_unsorted = unsorted[:]
    cycle = timeit.timeit(lambda: cycle_sort(cycle_unsorted), number=times)
    time_record['In Place Cycle Sort'].append(cycle)

    print 'gnome'
    gnome_unsorted = unsorted[:]
    gnome = timeit.timeit(lambda: gnome_sort(gnome_unsorted), number=times)
    time_record['Gnome Sort'].append(gnome)

    print 'pancake'
    pancake_unsorted = unsorted[:]
    pancake = timeit.timeit(lambda: pancake_sort(pancake_unsorted), number=times)
    time_record['In Place Pancake Sort'].append(pancake)

    print 'patience'
    patience_unsorted = unsorted[:]
    patience = timeit.timeit(lambda: patience_sort(patience_unsorted), number=times)
    time_record['In Place Patience Sort'].append(patience)

    print 'radix'
    radix_unsorted = unsorted[:]
    radix = timeit.timeit(lambda: radix_sort(radix_unsorted, length), number=times)
    time_record['Radix Sort'].append(radix)

    print 'selection'
    selection_unsorted = unsorted[:]
    selection = timeit.timeit(lambda: selection_sort(selection_unsorted), number=times)
    time_record['Selection Sort'].append(selection)

    print 'tree'
    tree_unsorted = unsorted[:]
    tree_sorted = timeit.timeit(lambda: abstract_tree_sort(tree_unsorted, BinarySearchTree), number=times)
    time_record['Abstract Tree Sort'].append(tree_sorted)

    print 'tim in c'
    tim_unsorted = unsorted[:]
    tim = timeit.timeit(lambda: sorted(tim_unsorted), number=times)
    time_record['Tim in C'].append(tim)

Answer 1

最佳排序算法取决于各种因素，包括输入属性（例如元素的大小）和对结果的要求（例如稳定性）。对于给定的输入集，Bubblesort在O（N）中可能异常快，而Quicksort在O（N x N）中可能异常慢，而Mergesort将始终在O（N x logN）。

在一般情况下，排序是在O（N x logN）中，即没有可以比这更快地对任意集进行排序的算法。但是，对于某些输入特性，存在与输入集的大小成线性的排序算法。显然，你不能比这更快。

如果您对排序知之甚少，最好的办法就是简单地比较一些常见的排序算法。由于您的输入包含＆＃34;唯一整数＆＃34;，因此您不必关心排序算法是否稳定。

在实际数据上尝试以下算法并选择最快：

归并
冒泡
快速排序
基数排序

如果可能输入的总数是＆＃34;小＆＃34;，您甚至可以跳过排序并预先计算所有可能的结果。

在给定特定列表的情况下，为排序算法寻找合理的时序测试

问题

用于对排序列表的切片进行排序的结果

排序未排序列表的切片的结果

测试代码

1 个答案: