是否有针对特定类型列表找到最佳排序算法的客观测试?我尝试过这样的测试,但我不确定它的稳健性。症结可能是;是否可以设计客观测试以推广最佳列表类型,或者这些决策是否需要经验证据?
我正在尝试为特定类型的列表找到最佳排序算法。它们包含2-202个具有唯一整数的项目。我正在努力找到排序数百万这样的列表的最快方法。
当我注意到用于python的C sorted(unsorted)
中的内置Tim排序仅略微快于我在Python中的天真测试simple_sort(unsorted_set, order)
时,这个搜索就开始了。有趣的是,Python中的quick_sort
并不总是比simple_sort
更快:
>>> def simple_sort(unsorted_set, order):
... sorted_list = []
... for i in order:
... if i in unsorted_set:
... sorted_list.append(i)
... return sorted_list
>>> unsorted = [1, 5, 2, 9]
>>> order = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> unsorted_set = {item for item in unsorted}
>>> print simple_sort(unsorted_set, order)
[1, 2, 5, 9]
在某些时候,我需要排序的算法将在C中重写,一旦我在C中足够熟悉就可以这样做。
simple_sort
将对我的特定类型的列表执行Tim Sort。sorted(unsorted)
是Tim Sort的C实现。2-202个排序项目的最快算法 2-202个已排序项目的最慢算法
simple_sort
更快。gnome_sort
预排序输入列表。 2-202个未分类商品的最快算法 2-1002未分类商品的最快算法 2-202个未排序项目的最慢算法 2-1002未分类商品的最慢算法
我已经链接了the full sorting test code here,因为排序算法会过长。
# 'times' is how many repetitions each algorithm should run
times = 100
# 'unique_items' is the number of non-redundant items in the unsorted list
unique_items = 1003
# 'order' is an ordered list
order = []
for number in range(unique_items):
order.append(number)
# Generate the unsorted list
random_list = order[:]
random.shuffle(random_list)
# 'random_set' is used for simple_sort
random_simple = random_list[:]
random_set = {item for item in random_simple}
# A list of all sorted lists for each algorithm
sorted_lists = [
simple_sort(random_set, order),
quick_sort(random_list[:]),
merge_sort(random_list[:]),
shell_sort(random_list[:]),
bubble_sort(random_list[:]),
heap_sort(random_list[:]),
insertion_sort(random_list[:]),
insertion_sort_bin(random_list[:]),
circle_sort(random_list[:]),
cocktail_sort(random_list[:]),
counting_sort(random_list[:], 0, unique_items),
cycle_sort(random_list[:]),
gnome_sort(random_list[:]),
pancake_sort(random_list[:]),
patience_sort(random_list[:]),
radix_sort(random_list[:], unique_items),
selection_sort(random_list[:]),
abstract_tree_sort(random_list[:], BinarySearchTree),
sorted(random_list[:])
]
# A set of all sorted lists for each algorithm
sorted_set = {repr(item) for item in sorted_lists}
# If only one version of the sorted list exists, True is evaluated
print 'All algorithms sort identically', len(sorted_set) is 1
# Sort slices of an unsorted list and record the times in 'time_record'
time_record = defaultdict(list)
for length in range(2, unique_items, 10):
unsorted = random_list[:length]
# 'unsorted_set' is used for simple_sort
simple_unsorted = unsorted[:]
unsorted_set = {item for item in simple_unsorted}
print '**********', length, '**********'
print 'simple'
simple = timeit.timeit(lambda: simple_sort(unsorted_set, order), number=times)
time_record['Simple Sort'].append(simple)
print 'quick'
quick_unsorted = unsorted[:]
quick = timeit.timeit(lambda: quick_sort(quick_unsorted), number=times)
time_record['Quick Sort'].append(quick)
print 'merge'
merge_unsorted = unsorted[:]
merged = timeit.timeit(lambda: merge_sort(merge_unsorted), number=times)
time_record['Merge Sort'].append(merged)
print 'shell'
shell_unsorted = unsorted[:]
shell = timeit.timeit(lambda: merge_sort(shell_unsorted), number=times)
time_record['Shell Sort'].append(shell)
print 'bubble'
bubble_unsorted = unsorted[:]
bubble = timeit.timeit(lambda: bubble_sort(bubble_unsorted), number=times)
time_record['In Place Bubble Sort'].append(bubble)
print 'heap'
heap_unsorted = unsorted[:]
heap = timeit.timeit(lambda: heap_sort(heap_unsorted), number=times)
time_record['In Place Heap Sort'].append(heap)
print 'insertion'
insertion_unsorted = unsorted[:]
insertion = timeit.timeit(lambda: insertion_sort(insertion_unsorted), number=times)
time_record['In Place Insertion Sort'].append(insertion)
print 'insertion binary'
insertion_bin_unsorted = unsorted[:]
insertion_bin = timeit.timeit(lambda: insertion_sort_bin(insertion_bin_unsorted), number=times)
time_record['In Place Insertion Sort Binary'].append(insertion_bin)
print 'circle'
circle_unsorted = unsorted[:]
circle = timeit.timeit(lambda: circle_sort(circle_unsorted), number=times)
time_record['In Place Circle Sort'].append(circle)
print 'cocktail'
cocktail_unsorted = unsorted[:]
cocktail = timeit.timeit(lambda: cocktail_sort(cocktail_unsorted), number=times)
time_record['In Place Cocktail Sort'].append(cocktail)
print 'counting'
counting_unsorted = unsorted[:]
counting = timeit.timeit(lambda: counting_sort(counting_unsorted, 0, length), number=times)
time_record['Counting Sort'].append(counting)
print 'cycle'
cycle_unsorted = unsorted[:]
cycle = timeit.timeit(lambda: cycle_sort(cycle_unsorted), number=times)
time_record['In Place Cycle Sort'].append(cycle)
print 'gnome'
gnome_unsorted = unsorted[:]
gnome = timeit.timeit(lambda: gnome_sort(gnome_unsorted), number=times)
time_record['Gnome Sort'].append(gnome)
print 'pancake'
pancake_unsorted = unsorted[:]
pancake = timeit.timeit(lambda: pancake_sort(pancake_unsorted), number=times)
time_record['In Place Pancake Sort'].append(pancake)
print 'patience'
patience_unsorted = unsorted[:]
patience = timeit.timeit(lambda: patience_sort(patience_unsorted), number=times)
time_record['In Place Patience Sort'].append(patience)
print 'radix'
radix_unsorted = unsorted[:]
radix = timeit.timeit(lambda: radix_sort(radix_unsorted, length), number=times)
time_record['Radix Sort'].append(radix)
print 'selection'
selection_unsorted = unsorted[:]
selection = timeit.timeit(lambda: selection_sort(selection_unsorted), number=times)
time_record['Selection Sort'].append(selection)
print 'tree'
tree_unsorted = unsorted[:]
tree_sorted = timeit.timeit(lambda: abstract_tree_sort(tree_unsorted, BinarySearchTree), number=times)
time_record['Abstract Tree Sort'].append(tree_sorted)
print 'tim in c'
tim_unsorted = unsorted[:]
tim = timeit.timeit(lambda: sorted(tim_unsorted), number=times)
time_record['Tim in C'].append(tim)
答案 0 :(得分:1)
最佳排序算法取决于各种因素,包括输入属性(例如元素的大小)和对结果的要求(例如稳定性)。对于给定的输入集,Bubblesort在O(N)中可能异常快,而Quicksort在O(N x N)中可能异常慢,而Mergesort将始终在O(N x logN)。
在一般情况下,排序是在O(N x logN)中,即没有可以比这更快地对任意集进行排序的算法。但是,对于某些输入特性,存在与输入集的大小成线性的排序算法。显然,你不能比这更快。
如果您对排序知之甚少,最好的办法就是简单地比较一些常见的排序算法。由于您的输入包含"唯一整数",因此您不必关心排序算法是否稳定。
在实际数据上尝试以下算法并选择最快:
如果可能输入的总数是"小",您甚至可以跳过排序并预先计算所有可能的结果。