在子流程中使用多处理

时间:2013-01-29 13:11:24

标签: python multiprocessing

在Windows中,必须检查进程是否为 main 才能使用多处理,否则会出现无限循环。

我尝试将进程的名称更改为子进程的名称,以便在我调用的类或函数中使用多处理,但没有运气。这甚至可能吗?最新的我没有使用多处理,除非它使用主进程。

如果有可能,有人可以提供一个如何在从更高进程调用的类或函数中使用多处理的示例吗?感谢。

编辑:

这是一个示例 - 第一个工作,但一切都在1个文件中完成: simplemtexample3.py:

import random
import multiprocessing
import math

def mp_factorizer(nums, nprocs):
    #schtze den prozess
    #print __name__
    if __name__ == '__main__':
        out_q = multiprocessing.Queue()
        chunksize = int(math.ceil(len(nums) / float(nprocs)))
        procs = []
        for i in range(nprocs):

            p = multiprocessing.Process(
                    target=worker,            
                    args=(nums[chunksize * i:chunksize * (i + 1)],
                          out_q))
            procs.append(p)
            p.start()

        # Collect all results into a single result dict. We know how many dicts
        # with results to expect.
        resultlist = []
        for i in range(nprocs):
            temp=out_q.get()
            index =0
            #print temp
            for i in temp:
                resultlist.append(temp[index][0][0:])
                index +=1

        # Wait for all worker processes to finish
        for p in procs:
            p.join()
            resultlist2 = [x for x in resultlist if x != []]
        return resultlist2

def worker(nums, out_q):
    """ The worker function, invoked in a process. 'nums' is a
        list of numbers to factor. The results are placed in
        a dictionary that's pushed to a queue.
    """
    outlist = []

    for n in nums:
        newnumber= n*2
        newnumberasstring = str(newnumber)
        if newnumber:
            outlist.append(newnumberasstring)
    out_q.put(outlist)

l = []
for i in range(80):
    l.append(random.randint(1,8))

print mp_factorizer(l, 4)

但是,当我尝试从另一个文件调用mp_factorizer时,由于if __name__ == '__main__'而无法正常工作:

simplemtexample.py

import random
import multiprocessing
import math

def mp_factorizer(nums, nprocs):
    #schtze den prozess
    #print __name__
    if __name__ == '__main__':
        out_q = multiprocessing.Queue()
        chunksize = int(math.ceil(len(nums) / float(nprocs)))
        procs = []
        for i in range(nprocs):

            p = multiprocessing.Process(
                    target=worker,            
                    args=(nums[chunksize * i:chunksize * (i + 1)],
                          out_q))
            procs.append(p)
            p.start()

        # Collect all results into a single result dict. We know how many dicts
        # with results to expect.
        resultlist = []
        for i in range(nprocs):
            temp=out_q.get()
            index =0
            #print temp
            for i in temp:
                resultlist.append(temp[index][0][0:])
                index +=1

        # Wait for all worker processes to finish
        for p in procs:
            p.join()
            resultlist2 = [x for x in resultlist if x != []]
        return resultlist2

def worker(nums, out_q):
    """ The worker function, invoked in a process. 'nums' is a
        list of numbers to factor. The results are placed in
        a dictionary that's pushed to a queue.
    """
    outlist = []

    for n in nums:
        newnumber= n*2
        newnumberasstring = str(newnumber)
        if newnumber:
            outlist.append(newnumberasstring)
    out_q.put(outlist)

startsimplemtexample.py

import simplemtexample as smt
import random

l = []
for i in range(80):
    l.append(random.randint(1,8))

print smt.mp_factorizer(l, 4)

2 个答案:

答案 0 :(得分:2)

如果想要使用多处理,

if __name__ == '__main__'是必需的(至少在Windows中)。

在Windows中,它的工作方式如下:对于要生成的每个工作线程,Windows将自动启动主进程,并再次启动所有需要的文件。但是,只有已启动的第一个进程称为 main 。这就是阻止mt_factorizer与if __name__ == '__main__'的执行阻止多处理创建无限循环的原因。

因此,Windows本质上需要读取包含worker的文件,以及worker调用的所有函数 - 每个worker。通过阻止mt_factorizer,我们确保不会创建额外的worker,而windows仍然可以执行worker。这就是为什么在一个文件中包含所有代码的多处理示例直接阻止工作者的创建(例如在这种情况下就像mt_factorizer一样)(但不是worker函数),因此windows仍然可以执行worker函数。如果所有代码都在一个文件中,并且整个文件都受到保护,则不能创建任何工作程序。

如果多处理代码位于另一个类中并被调用,则需要在调用之上直接实现if __name__ == '__main__': mpteststart.py

import random
import mptest as smt

l = []
for i in range(4):
    l.append(random.randint(1,8))
print "Random numbers generated"
if __name__ == '__main__':
    print smt.mp_factorizer(l, 4)

mptest.py

import multiprocessing
import math

print "Reading mptest.py file"
def mp_factorizer(nums, nprocs):

    out_q = multiprocessing.Queue()
    chunksize = int(math.ceil(len(nums) / float(nprocs)))
    procs = []
    for i in range(nprocs):

        p = multiprocessing.Process(
                target=worker,            
                args=(nums[chunksize * i:chunksize * (i + 1)],
                      out_q))
        procs.append(p)
        p.start()

    # Collect all results into a single result dict. We know how many dicts
    # with results to expect.
    resultlist = []
    for i in range(nprocs):
        temp=out_q.get()
        index =0
        #print temp
        for i in temp:
            resultlist.append(temp[index][0][0:])
            index +=1

    # Wait for all worker processes to finish
    for p in procs:
        p.join()
        resultlist2 = [x for x in resultlist if x != []]
    return resultlist2

def worker(nums, out_q):
    """ The worker function, invoked in a process. 'nums' is a
        list of numbers to factor. The results are placed in
        a dictionary that's pushed to a queue.
    """
    outlist = []

    for n in nums:
        newnumber= n*2
        newnumberasstring = str(newnumber)
        if newnumber:
            outlist.append(newnumberasstring)
    out_q.put(outlist)

在上面的代码中,if __name__ == '__main__'已被删除,因为它已经在调用文件中。

然而,结果有些出乎意料:

Reading mptest.py file
random numbers generated
Reading mptest.py file
random numbers generated
worker started
Reading mptest.py file
random numbers generated
worker started
Reading mptest.py file
random numbers generated
worker started
Reading mptest.py file
random numbers generated
worker started
['1', '1', '4', '1']

多处理被阻止无休止执行,但其余代码仍在执行多次(在这种情况下生成随机数)。这不仅会导致性能下降,还可能导致其他令人讨厌的错误。解决方案是保护整个主进程不被Windows重复执行,如果在某个地方使用多处理: mptest.py

import random
import mptest as smt

if __name__ == '__main__':  
    l = []
    for i in range(4):
        l.append(random.randint(1,8))
    print "random numbers generated"   
    print smt.mp_factorizer(l, 4)

现在我们得到的只是期望的结果,随机数只生成一次:

Reading mptest.py file
random numbers generated
Reading mptest.py file
worker started
Reading mptest.py file
worker started
Reading mptest.py file
worker started
Reading mptest.py file
worker started
['1', '6', '2', '1']

请注意,在此示例中,mpteststart.py是主要进程。如果不是,则if __name__ == '__main__'必须向上移动到调用链,直到它在主进程中。 一旦主进程受到这种方式的保护,就不会再有不必要的重复代码执行了。

答案 1 :(得分:1)

Windows lacks os.fork.因此在Windows上,多处理模块启动一个新的Python解释器并(重新)导入调用multiprocessing.Process的脚本。

使用if __name__ == '__main__'的目的是保护在重新导入脚本时再次调用multiprocessing.Process的调用。 (如果你不保护它,那么你会得到一个叉炸弹。)

如果您从一个类或函数中调用multiprocessing.Process,而在重新导入脚本时不会调用,那么就没有问题了。请继续像往常一样使用multiprocessing.Process