为什么在多处理中执行过程需要花费不同的时间?

时间:2018-08-01 13:26:58

标签: python-3.x python-2.7 multiprocessing

我正在运行一个程序,该程序使用多重处理来处理3个laks行数据帧。我使用在Python中使用multiprocess.process创建的62个进程在64个核心VM上执行此操作。每个进程被喂入4900行。

奇怪的是,过程需要不同的时间才能完成。第一个进程在15分钟内完成了任务,而最后一个进程花费了70多分钟。以下是我使用的用于多处理的代码块。

import multiprocessing

# define dataframe here
data_thread = data
uid = "final"   ### make sure to change uid

batch_size = 4900
counter = 0
datalen = len(data_thread)
Flag = True
processes = []

while(Flag):
    start = counter*batch_size
    end = min(datalen, start+batch_size)
    if end>=datalen:
        Flag = False

    indices.append((start, end))
    data_split = data_thread.iloc[start:end]
    threadName = "process_"+str(counter)
    processes.append(multiprocessing.Process(target=process, args = (data_split, uid, threadName, start, end, )))
    counter = counter+1

startCount = 0
while(startCount<len(processes)):
    t = processes[startCount]
    try:
        t.start()
    except:
        print("Error encountered while starting the process_%lf: %s"%(startCount, str(indices[startCount])))
    print("Started: process_" + str(startCount))
    startCount = startCount + 1

endCount = 0
while(endCount<len(processes)):
    t = processes[endCount]
    t.join()
    print("Joined: process_" + str(endCount))
    endCount = endCount + 1

0 个答案:

没有答案