python中的多进程快速排序会挂起大量元素

时间:2012-09-16 23:42:25

标签: python multiprocessing quicksort

为了更多地了解python中的进程,我决定使用多个进程实现quicksort。它似乎在具有少于~21K元素的列表上工作正常但是更大(例如25K元素)它只是挂起。

我尝试了import sys; sys.setrecursionlimit(2**30)诀窍无济于事。这只是因为python不喜欢真正的深度递归吗?有趣的是,相同的算法在~50k元素上作为单个过程可以正常工作。

import random, multiprocessing, time

DESIRED_PROC_DEPTH = 2
proc_depth = 1 #start at one to count the parent process that kicks this all off

def main():
    import sys; sys.setrecursionlimit(2**30)
    num_elements = 20000
    list_of_numbers = [random.randint(0,num_elements) for num in range(num_elements)]
    # print list_of_numbers
    simple_start = time.time()
    simple_sorted = simple_qs(list_of_numbers)
    simple_total_time = time.time() - simple_start
    print "Using a single thread we sorted " + str(num_elements) + " elements in " + str(simple_total_time) + " seconds"
    the_q = multiprocessing.Queue()
    sorter = MTQuickSort(list_of_numbers, the_q)
    start = time.time()
    sorter.start()
    sorter.join()
    mt_total_time = time.time() - start
    sorted_list = the_q.get()
    # print sorted_list
    print "Sorted " + str(num_elements) + " elements in " + str(mt_total_time) + " seconds"


class MTQuickSort(multiprocessing.Process):

    def __init__(self, list_to_sort, queue):
        super(MTQuickSort, self).__init__()
        print self.name
        self.queue = queue
        self.list = list_to_sort

    def run(self):
        self.queue.put(self.quicksort(self.list))

    def quicksort(self, list):
        global proc_depth
        global DESIRED_PROC_DEPTH
        if len(list) is 0:
            # base case is that list is empty
            return []
        else:
            pivot = list[0]
            less = [x for x in list[1:] if x <= pivot]
            greater = [x for x in list[1:] if x > pivot]
            if proc_depth < DESIRED_PROC_DEPTH:
                proc_depth += 1
                # We create threads in blocks of two since we partition the list into two parts
                lessor_q = multiprocessing.Queue()
                greater_q = multiprocessing.Queue()
                qs_process1 = MTQuickSort(less, lessor_q)
                qs_process2 = MTQuickSort(greater, greater_q)
                qs_process1.start()
                qs_process2.start()
                qs_process1.join()
                qs_process2.join()
                return lessor_q.get() + [pivot] + greater_q.get()
            else:
                less_than_pivot = self.quicksort(less)
                greater_than_pivot = self.quicksort(greater)
                return less_than_pivot + [pivot] + greater_than_pivot

def simple_qs(list):
    if len(list) is 0:
        # base case is that list is empty
        return []
    else:
        pivot = list[0]
        return simple_qs([x for x in list[1:] if x <= pivot]) + [pivot] + simple_qs([x for x in list[1:] if x > pivot])



if __name__ == '__main__':
    main()

1 个答案:

答案 0 :(得分:2)

这是因为在从队列中获取()结果之前加入()子进程。显然,队列不会急切地从子进程中读取数据。因此,当您调用join()时,调用者的进程将等待,直到子进程完成,但子进程本身正在等待尝试将数据发送到其父进程的管道。

如果我在相应的get()之后移动各种join(),它就可以工作。