问题

Question

我正在学习众所周知的排序算法，并自己实现。我最近完成了合并排序，我的代码是：

def merge(l, r, direction):
    # print('merging')
    # print(l, r)
    # adding infinity to end of list so we know when we've hit the bottom of one pile
    l.append(inf)
    r.append(inf)
    A = []
    i, j = 0, 0

    while (i < len(l)) and (j < len(r)):
        if l[i] <= r[j]:
            A.append(l[i])
            i += 1
        else:
            A.append(r[j])
            j += 1
    # removing infinity from end of list
    A.pop()

    return(A)


def merge_sort(num_lst, direction='increasing', level=0):
    if len(num_lst) > 1:
        mid = len(num_lst)//2
        l = num_lst[:mid]
        r = num_lst[mid:]
        l = merge_sort(l, level=level + 1)
        r = merge_sort(r, level=level + 1)
        num_lst = merge(l, r, direction)

    return num_lst

我在其他实现中看到的与我自己的实现不同的是列表的合并。在这里，我只是创建一个空白列表，然后按数字顺序附加元素，然后将预先存在的元素传递给merge并覆盖每个元素以按数字顺序创建列表。像这样：

def merge(arr, l, m, r): 
    n1 = m - l + 1
    n2 = r- m 

    # create temp arrays 
    L = [0] * (n1) 
    R = [0] * (n2) 

    # Copy data to temp arrays L[] and R[] 
    for i in range(0 , n1): 
        L[i] = arr[l + i] 

    for j in range(0 , n2): 
        R[j] = arr[m + 1 + j] 

    # Merge the temp arrays back into arr[l..r] 
    i = 0     # Initial index of first subarray 
    j = 0     # Initial index of second subarray 
    k = l     # Initial index of merged subarray 

    while i < n1 and j < n2 : 
        if L[i] <= R[j]: 
            arr[k] = L[i] 
            i += 1
        else: 
            arr[k] = R[j] 
            j += 1
        k += 1

    # Copy the remaining elements of L[], if there 
    # are any 
    while i < n1: 
        arr[k] = L[i] 
        i += 1
        k += 1

    # Copy the remaining elements of R[], if there 
    # are any 
    while j < n2: 
        arr[k] = R[j] 
        j += 1
        k += 1

我对以下内容感到好奇：

问题

在空白列表上使用append()是个坏主意吗？根据我的理解，当python创建一个列表时，它会捕获一定大小的内存，如果我们的列表超出了该范围，则会将该列表复制到另一个更大的内存区域中（如果即使一次为一个事件发生，这似乎会付出很高的代价）大名单，更不用说重复了）与按索引访问列表相比，使用append()的成本是否更高？在我看来，append能够以相当低的成本进行操作。

Answer 1

当实例化一个列表时，Python将分配用于保存项目所需的内存以及一些额外的内存，以供将来追加/扩展。当您在列表中放入过多的东西时，最终需要重新分配它，这可能会减慢程序的速度（请参见源代码的this part）。重新分配发生here，其大小计算为：

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

the comment指出增长模式为：0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...。

因此，如果预先知道最终列表的大小并从头开始分配它，将为您节省一些额外的重新分配（从而节省计算时间）。

Answer 2

from time import time 
append_arr = []
index_arr = [None] * 10240*1024

t0 = time()
for x in range(10240*1024):
    append_arr.append(x)
t1 = time()

t2 = time()
for i in range(10240*1024):
    index_arr[i] = i
t3 = time()

print(str(t1-t0))
print(str(t3-t2))

好像追加比较慢。

合并排序python实现方法

问题

2 个答案: