Question

我必须对代码进行并行化处理，然后从参数文件中读取一行，进行并行化处理，然后读取下一行，直到文件结束。我做了这件事：

def unpack(func):
    @wraps(func)
    def wrapper(arg_tuple):
        return func(*arg_tuple)
    return wrapper

@unpack
def parallel_job(seed,distributioncsv,shift):
    #for each core, create a different file, use different seeds and start
    f = open(distributioncsv,'w+')
    random.seed(seed)
    np.random.seed(seed)
    #number of simulation each core should make
    threadsim = simnum/threadnum
    for i in range (0,threadsim):
          ...do stuff

我的主要工作是这样的：我读取文件，循环浏览并调用多处理。首先，我定义一些常量：

if __name__ == '__main__':
    #number of simulations, and number of threads to use
    threadnum = 10

    simnum = threadnum*10

    #order in file: Network, N, lambda, gamma, k, i0, tauf, folder
    N_f, lamma_f, gamma_f,k_f, i0_f, tauf_f = np.loadtxt("parameters.txt", delimiter=',', dtype = np.float, usecols =[1,2,3,4,5,6], unpack = True)
    folder_f, networkchoice_f =   np.loadtxt("parameters.txt", delimiter=',', dtype = np.str, usecols =[7,0], unpack = True)

    for i in range(0,len(N_f)):
        #number of nodes
        N = N_f[i]
        #per node infection probability 
        lamma = lamma_f[i]
        #per node recovery probability
        gamma = gamma_f[i]
        #average network degree or number of new links per node
        k = int(k_f[i])
        #initial number of infected nodes
        i0 = int(i0_f[i])
        #tauend of simulations
        tauf = tauf_f[i]
        #folder where to save files
        folder =  os.getenv("HOME")+folder_f[i]
        #Network to simulate
        networkchoice = networkchoice_f[i]

        #where to put the sum of all the distributions
        distributioncsv = folder +"/distribution.csv"

        #where to put all the figures
        destinationofallfigures = folder+"/Figures/a(k)/"
        #file for the k - E(k) values
        akfile = folder+'/csv/E(ak).csv'
        #plot the mean epidemics from simulations (t, I)
        avgepidemics = folder+"/Figures/I(t)/average"
        #columns name
        name = ['I', 'SI', 'deltas','t', 'run']
        saveplots = folder+"/Figures/"
        #file for the mean average
        averagecsv = folder+"/csv/average"

        #different seed for each thread
        seed = [j*2759 + 37*j**2 + 4757 for j in range(threadnum)]
        #to enumerate my runs without loosing track of them   
        shift=[j*simnum for j in range(simnum)]
        #vector with the name of the files to be created
        distribution = [folder+"/csv/distribution_%d.csv" %j for j in range(threadnum)]

这是关于并行化的相关部分

        arguments = zip(seed,distribution,shift)

        #print arguments


        #begin parallelization

        pool = multiprocessing.Pool(threadnum)

        #spawn threadnum threads and give them parallel jobs
        pool.map(parallel_job, iterable=arguments)

        pool.close()
        # close the parallelization waiting for all the thread to be done
        pool.join()
        ... do other unparallelized stuff and end the loop

每次循环结束时，我期望内存使用量会减少，因为有时会调用pool.close（）和pool.join（）。

相反，发生的是一个又一个的循环，内存使用量不断增加。

是因为我的parallel_job函数没有返回值吗？我应该在parallel_job末尾返回None吗？目前我什么也没退。

编辑：我现在正在测量ram使用率的增加。不幸的是，该过程需要很长时间。上一次启动此过程时，四个小时后，它耗尽了我PC的所有可用ram和swap（30 GB）。

如果我启动该程序的非并行版本，则每个循环消耗约3 GB的RAM。

甚至在调用pool.close和pool.join之后，Python也会对RAM进行多处理

0 个答案: