我必须对代码进行并行化处理,然后从参数文件中读取一行,进行并行化处理,然后读取下一行,直到文件结束。我做了这件事:
def unpack(func):
@wraps(func)
def wrapper(arg_tuple):
return func(*arg_tuple)
return wrapper
@unpack
def parallel_job(seed,distributioncsv,shift):
#for each core, create a different file, use different seeds and start
f = open(distributioncsv,'w+')
random.seed(seed)
np.random.seed(seed)
#number of simulation each core should make
threadsim = simnum/threadnum
for i in range (0,threadsim):
...do stuff
我的主要工作是这样的:我读取文件,循环浏览并调用多处理。首先,我定义一些常量:
if __name__ == '__main__':
#number of simulations, and number of threads to use
threadnum = 10
simnum = threadnum*10
#order in file: Network, N, lambda, gamma, k, i0, tauf, folder
N_f, lamma_f, gamma_f,k_f, i0_f, tauf_f = np.loadtxt("parameters.txt", delimiter=',', dtype = np.float, usecols =[1,2,3,4,5,6], unpack = True)
folder_f, networkchoice_f = np.loadtxt("parameters.txt", delimiter=',', dtype = np.str, usecols =[7,0], unpack = True)
for i in range(0,len(N_f)):
#number of nodes
N = N_f[i]
#per node infection probability
lamma = lamma_f[i]
#per node recovery probability
gamma = gamma_f[i]
#average network degree or number of new links per node
k = int(k_f[i])
#initial number of infected nodes
i0 = int(i0_f[i])
#tauend of simulations
tauf = tauf_f[i]
#folder where to save files
folder = os.getenv("HOME")+folder_f[i]
#Network to simulate
networkchoice = networkchoice_f[i]
#where to put the sum of all the distributions
distributioncsv = folder +"/distribution.csv"
#where to put all the figures
destinationofallfigures = folder+"/Figures/a(k)/"
#file for the k - E(k) values
akfile = folder+'/csv/E(ak).csv'
#plot the mean epidemics from simulations (t, I)
avgepidemics = folder+"/Figures/I(t)/average"
#columns name
name = ['I', 'SI', 'deltas','t', 'run']
saveplots = folder+"/Figures/"
#file for the mean average
averagecsv = folder+"/csv/average"
#different seed for each thread
seed = [j*2759 + 37*j**2 + 4757 for j in range(threadnum)]
#to enumerate my runs without loosing track of them
shift=[j*simnum for j in range(simnum)]
#vector with the name of the files to be created
distribution = [folder+"/csv/distribution_%d.csv" %j for j in range(threadnum)]
这是关于并行化的相关部分
arguments = zip(seed,distribution,shift)
#print arguments
#begin parallelization
pool = multiprocessing.Pool(threadnum)
#spawn threadnum threads and give them parallel jobs
pool.map(parallel_job, iterable=arguments)
pool.close()
# close the parallelization waiting for all the thread to be done
pool.join()
... do other unparallelized stuff and end the loop
每次循环结束时,我期望内存使用量会减少,因为有时会调用pool.close()和pool.join()。
相反,发生的是一个又一个的循环,内存使用量不断增加。
是因为我的parallel_job函数没有返回值吗? 我应该在parallel_job末尾返回None吗?目前我什么也没退。
编辑:我现在正在测量ram使用率的增加。不幸的是,该过程需要很长时间。上一次启动此过程时,四个小时后,它耗尽了我PC的所有可用ram和swap(30 GB)。
如果我启动该程序的非并行版本,则每个循环消耗约3 GB的RAM。