Python的多重处理比给定

时间:2016-07-06 12:23:06

标签: python multiprocessing python-multiprocessing

我目前正尝试在我的模拟运行中使用多重处理,以同时评估不同的输入值。

因此,在过去的几周里我搜索了很多东西并得到了一些可能不是很漂亮的东西,但它(某种程度上)起作用。我现在的问题是,它返回的输出比我给它的任务要多,而且我不明白为什么。

有时每个模拟运行只返回一个预期的值,但在下面的例子中我会看到例如模拟运行5只是[23]。它也有所不同,哪个模拟运行会产生比预期更多的输出。当我将周期数增加到例如2,它会生成4个输出值,但我无法弄清楚为什么会这样。

可以请有人给我一个暗示我怎么能改变它?我无法找到答案,我感到非常沮丧:( 关于如何改进我的代码的任何建议都会非常感激,因为我对python很新,我喜欢它到目前为止:)

这是我使用的简化代码:

import numpy as np
from multiprocessing import Process, Queue
import multiprocessing
from itertools import repeat

class Simulation(Process):
    Nr = 1
    Mean = 5
    StdDev = 3
    Periods = 10
    Result = []

    def Generate_Value(self):
        GeneratedValue = max(int(round(np.random.normal(self.Mean, self.StdDev), 0)), 0)
        return GeneratedValue

    def runSimulation(self):
        for i in range(self.Periods):
            self.Result.append(self.Generate_Value())
        return self.Result

def worker(Mean, stdDev, Periods, Nr, queue):
    Sim = Simulation()
    Sim.Nr = Nr
    Sim.Periods = Periods
    Sim.Mean = Mean
    Sim.StdDev = stdDev
    Results = Sim.runSimulation()
    queue.put(Results)
    print("Simulation run " + str(Nr) + " done with a result of " + str(Results)
          + " (Input: mean: " + str(Mean) + ", std. dev.: " + str(stdDev) + ")")

if __name__ == '__main__':
    m = multiprocessing.Manager()
    queue = m.Queue()
    CPUS = multiprocessing.cpu_count() # CPUS = 8
    WORKERS = multiprocessing.Pool(processes=CPUS)

    Mean = [50, 60, 70, 80, 90]
    StdDev = [10, 10, 10, 10, 10]
    Periods = 1
    Nr = list(range(1,len(Mean) + 1))

    WORKERS.starmap(worker, zip(Mean, StdDev, repeat(Periods), Nr, repeat(queue)))
    WORKERS.close()
    WORKERS.join()

    FinalSimulationResults = []
    for i in range(len(Mean)):
        FinalSimulationResults.append(queue.get())
    print(FinalSimulationResults)

导致例如这样:

Simulation run 1 done with a result of [23] (Input: mean: 50, std. dev.: 10)
Simulation run 2 done with a result of [55] (Input: mean: 60, std. dev.: 10)
Simulation run 3 done with a result of [64] (Input: mean: 70, std. dev.: 10)
Simulation run 5 done with a result of [23, 89] (Input: mean: 90, std. dev.: 10)
Simulation run 4 done with a result of [78] (Input: mean: 80, std. dev.: 10)
[[23], [55], [64], [23, 89], [78]]

现在可以使用:)。没有我预期的那么快(8个核心只有2倍),但对于可能遇到同样问题的每个人来说,这是我的工作代码:

import numpy as np
from multiprocessing import Process, Queue
import multiprocessing
from itertools import repeat

class Simulation():
    def __init__(self, Nr, Mean, Std_dev, Periods):
        self.Result = []
        self.Nr = Nr
        self.Mean = Mean
        self.StdDev = Std_dev
        self.Periods = Periods

    def Generate_Value(self):
        GeneratedValue = max(int(round(np.random.normal(self.Mean, self.StdDev), 0)), 0)
        return GeneratedValue

    def runSimulation(self):
        for i in range(self.Periods):
            self.Result.append(self.Generate_Value())
        return self.Result

def worker(Mean, stdDev, Periods, Nr, queue):
    Sim = Simulation(Nr=Nr,Mean=Mean,Std_dev=stdDev,Periods=Periods)
    Results = Sim.runSimulation()
    queue.put(Results)
    print("Simulation run " + str(Nr) + " done with a result of " + str(Results)
          + " (Input: mean: " + str(Mean) + ", std. dev.: " + str(stdDev) + ")")

if __name__ == '__main__':
    start = time.time()
    m = multiprocessing.Manager()
    queue = m.Queue()
    CPUS = multiprocessing.cpu_count()
    WORKERS = multiprocessing.Pool(processes=CPUS)

    Mean = [50, 60, 70, 80, 90]
    StdDev = [10, 10, 10, 10, 10]
    Periods = 100
    Nr = list(range(1,len(Mean) + 1))

    WORKERS.starmap(worker, zip(Mean, StdDev, repeat(Periods), Nr, repeat(queue)))
    WORKERS.close()
    WORKERS.join()

    FinalSimulationResults = []
    for i in range(len(Mean)):
        FinalSimulationResults.append(queue.get())

    print(FinalSimulationResults)

1 个答案:

答案 0 :(得分:1)

将属性分配给类的方式使属性类属性。这样,它们在类的每个实例之间共享。在您的情况下,这似乎不会出现,因为在每个进程中,您只有一个类的实例,并且类对象本身不在进程之间共享。现在,如果一个worker尽早完成它可以获得另一个任务,那么类对象将被重用,类属性可以“按预期”工作。

为了避免这种情况,您应该始终在__init__函数中分配实例属性(即应该与实例不同的属性):

class Simulation(Process):

    def __init__(self, nr, mean, std_dev, periods):
        self.nr = nr
        self.mean = mean
        self.std_dev = std_dev
        self.periods = periods
        self.result = []

    def Generate_Value(self):
        GeneratedValue = max(int(round(np.random.normal(self.Mean, self.StdDev), 0)), 0)
        return GeneratedValue

    def runSimulation(self):
        for i in range(self.Periods):
            self.Result.append(self.Generate_Value())
        return self.Result

有关详细信息,请参阅the documentation

那就是说我不认为你应该以你使用它的方式使用Process类。 Pool会自动为您创建流程创建,您只需要告诉它该做什么。所以重写你的代码:

def task(nr, mean, std_dev, periods, results):
    for i in range(periods):
        results.append(max(int(round(np.random.normal(self.Mean, self.StdDev), 0)), 0))
    return results


m = multiprocessing.Manager()
queue = m.Queue()
cpu_count = multiprocessing.cpu_count() # CPUS = 8
pool = multiprocessing.Pool(processes=CPUS)

Mean = [50, 60, 70, 80, 90]
StdDev = [10, 10, 10, 10, 10]
Periods = 1
Nr = list(range(1,len(Mean) + 1))

pool.starmap(task, zip(Mean, StdDev, repeat(Periods), Nr, repeat(queue)))
pool.close()
pool.join()

应该有效(未经测试)。