我创建了以下代码:
M=20000
sample_all = np.load('sample.npy')
sd = np.zeros(M)
chi_arr = np.zeros((M,4))
sigma_e = np.zeros((M,41632))
mean_sigma = np.zeros(M)
max_sigma = np.zeros(M)
min_sigma = np.zeros(M)
z = np.load('z_array.npy')
prof = np.load('profile_at_sources.npy')
L = np.load('luminosities.npy')
for k in range(M):
sd[k]=np.array(sp.std(sample_all[k,:]))
arr = np.genfromtxt('samples_fin1.txt').T[2:6]
arr_T = arr.T
chi_arr[k,:] = arr_T[k,:]
sigma_e[k,:]=np.sqrt(calc(z,prof,chi_arr[k,:], L))
mean_sigma[k] = np.array(sp.mean(sigma_e[k,:]))
max_sigma[k] = np.array(sigma_e[k,:].max())
min_sigma[k] = np.array(sigma_e[k,:].min())
其中calc(...)是一个计算某些东西的函数(对我的问题不重要)
对于M = 20000,此循环在我的机器上需要大约27小时。这已经足够......有一种方法可以优化它,可能用向量代替循环吗?
对我而言,这是非常简单的创建循环,我的脑袋用这种代码的循环来思考......这是我的局限......你能帮助我吗?感谢
答案 0 :(得分:0)
在我看来,在各种数组中创建的每个第k行都独立于for循环的每个第k次迭代,并且只依赖于sigma_e的行...所以你可以将它并行化很多工人。不确定代码是否是100%犹太教,但你没有提供一个有效的例子。 请注意,这仅适用于每个第k次迭代完全独立于每次第k次迭代的情况。
M=20000
sample_all = np.load('sample.npy')
sd = np.zeros(M)
chi_arr = np.zeros((M,4))
sigma_e = np.zeros((M,41632))
mean_sigma = np.zeros(M)
max_sigma = np.zeros(M)
min_sigma = np.zeros(M)
z = np.load('z_array.npy')
prof = np.load('profile_at_sources.npy')
L = np.load('luminosities.npy')
workers = 100
arr = np.genfromtxt('samples_fin1.txt').T[2:6] # only works if this is really what you're doing to set arr.
def worker(k_start, k_end):
for k in range(k_start, k_end + 1):
sd[k]=np.array(sp.std(sample_all[k,:]))
arr_T = arr.T
chi_arr[k,:] = arr_T[k,:]
sigma_e[k,:]=np.sqrt(calc(z,prof,chi_arr[k,:], L))
mean_sigma[k] = np.array(sp.mean(sigma_e[k,:]))
max_sigma[k] = np.array(sigma_e[k,:].max())
min_sigma[k] = np.array(sigma_e[k,:].min())
threads = []
kstart = 0
for k in range(0, workers):
T = threading.Thread(target=worker, args=[0 + k * M / workers, (1+ k) * M / workers - 1 ])
threads.append(T)
T.start()
for t in threads:
t.join()
编辑以下评论: 似乎there's a mutex that CPython places on all objects阻止了并行访问。使用IronPython或Jython来解决这个问题。此外,如果您真的只是从samples_fin1.txt反序列化相同的数组,则可以将文件移到外面读取。