通过使用向量python优化循环

时间:2018-02-16 12:56:28

标签: python loops for-loop vectorization

我创建了以下代码:

M=20000
sample_all = np.load('sample.npy')  
sd = np.zeros(M)
chi_arr = np.zeros((M,4))
sigma_e = np.zeros((M,41632))
mean_sigma = np.zeros(M)
max_sigma = np.zeros(M)
min_sigma = np.zeros(M)
z = np.load('z_array.npy')
prof = np.load('profile_at_sources.npy')
L = np.load('luminosities.npy')

for k in range(M):
    sd[k]=np.array(sp.std(sample_all[k,:]))
    arr = np.genfromtxt('samples_fin1.txt').T[2:6]
    arr_T = arr.T
    chi_arr[k,:] = arr_T[k,:]               
    sigma_e[k,:]=np.sqrt(calc(z,prof,chi_arr[k,:], L))
    mean_sigma[k] = np.array(sp.mean(sigma_e[k,:]))
    max_sigma[k] = np.array(sigma_e[k,:].max())
    min_sigma[k] = np.array(sigma_e[k,:].min())

其中calc(...)是一个计算某些东西的函数(对我的问题不重要)

对于M = 20000,此循环在我的机器上需要大约27小时。这已经足够......有一种方法可以优化它,可能用向量代替循环吗?

对我而言,这是非常简单的创建循环,我的脑袋用这种代码的循环来思考......这是我的局限......你能帮助我吗?感谢

1 个答案:

答案 0 :(得分:0)

在我看来,在各种数组中创建的每个第k行都独立于for循环的每个第k次迭代,并且只依赖于sigma_e的行...所以你可以将它并行化很多工人。不确定代码是否是100%犹太教,但你没有提供一个有效的例子。 请注意,这仅适用于每个第k次迭代完全独立于每次第k次迭代的情况。

M=20000
sample_all = np.load('sample.npy')  
sd = np.zeros(M)
chi_arr = np.zeros((M,4))
sigma_e = np.zeros((M,41632))
mean_sigma = np.zeros(M)
max_sigma = np.zeros(M)
min_sigma = np.zeros(M)
z = np.load('z_array.npy')
prof = np.load('profile_at_sources.npy')
L = np.load('luminosities.npy')
workers = 100
arr = np.genfromtxt('samples_fin1.txt').T[2:6] # only works if this is really what you're doing to set arr.

def worker(k_start, k_end):
    for k in range(k_start, k_end + 1):
        sd[k]=np.array(sp.std(sample_all[k,:]))
        arr_T = arr.T
        chi_arr[k,:] = arr_T[k,:]        


    sigma_e[k,:]=np.sqrt(calc(z,prof,chi_arr[k,:], L))
    mean_sigma[k] = np.array(sp.mean(sigma_e[k,:]))
    max_sigma[k] = np.array(sigma_e[k,:].max())
    min_sigma[k] = np.array(sigma_e[k,:].min())

threads = []
kstart = 0
for k in range(0, workers):
    T = threading.Thread(target=worker, args=[0 + k * M / workers, (1+ k) * M / workers - 1 ])
    threads.append(T)
    T.start()

for t in threads:
  t.join()

编辑以下评论: 似乎there's a mutex that CPython places on all objects阻止了并行访问。使用IronPython或Jython来解决这个问题。此外,如果您真的只是从samples_fin1.txt反序列化相同的数组,则可以将文件移到外面读取。