Python Multiprocessing调用多个方法

时间:2014-11-25 18:05:39

标签: python parallel-processing multiprocessing pickle

我有一个Python类,它使用多处理池来处理和清理大型数据集。执行大部分清理工作的方法是“dataCleaner”,需要调用第二种方法' processObservation'。 我是Python多处理的新手,我似乎无法弄清楚如何确保方法' processObservation'将来自' cleanData'何时产生新进程。我怎样才能做到这一点?我倾向于将所有这些方法保留在课堂上。我怀疑这与' 电话'有关。定义,但不确定如何正确修改它。

def processData(self, dataset, num_procs = mp.cpu_count()):
    dataSize = len(dataset)
    outputDict = dict()
    procs = mp.Pool(processes = num_procs, maxtasksperchild = 1)

    # Generate data chunks for processing.
    chunk = dataSize / num_procs
    dataChunk = [(i, i + chunk) for i in range(0, dataSize, chunk)]
    count = 1
    print 'Number of data chunks %d' %len(dataChunk)
    for i in dataChunk:
        procs.apply_async(self.dataCleaner, args = (dataset[i[0]:i[1]], count, ))
        count += 1
    procs.close()
    procs.join()

def cleanData(self, data, procNumber):
    print 'Spawning new process: %d' %os.getpid()
    tempDict = dict()
    print len(data)
    for obs in data:
        key, value = processObservation(obs)
        tempDict[key] = value
    cPickle.dump(tempDict, open( '../dataMP/cleanedData_' + str(procNumber) + '.p', 'wb'))

def __call__(self, dataset, count):
    return self.cleanData(dataset, count)

1 个答案:

答案 0 :(得分:1)

很难说b / c上发生了什么,你没有给出可重复的代码或错误。

但是,您的问题很可能是因为您在课堂内使用multiprocessing

请参阅:Using multiprocessing in a classMultiprocessing: How to use Pool.map on a function defined in a class?