我正在努力提高用python编写的自举算法的效率。当矩阵的大小相对较小时,计算速度是可以的,但是当n和k很大时,它会显着减慢。
因为算法本质上是数组操作,所以无论如何都要通过优化算法本身或重写它以便使用诸如pycuda / tensorflow等包来提高效率,因为可以访问GPU资源吗?
import numpy as np
n = 5000
k = 3000
ary = np.random.choice([0,1],size=(n,k))
lst = []
cols = list(range(ary.shape[1]))
while len(lst) < ary.shape[1]:
print(len(lst), ary.shape[1])
a = [lst+[x] for x in cols]
b = [ary[:,x] for x in a]
c = np.asarray(b)
d = c.sum(axis=2)
d = np.repeat(d[..., None], c.shape[-1], 2)
d1 = np.divide(c, d)
d2 = np.nanmean(d1, axis=1)
d2 = np.nan_to_num(d2)
d3 = d2[:, -1]
d4 = d3 / np.sum(d3)
lst.extend([np.random.choice(cols, p=d4)])
print(d4.shape)
print(lst)