我有以下代码
np.array([points[label==k].mean(axis = 0) for k in range(self.k)])
Points是一个n x d数组,label是一个1 x n数组,其值最大为k,k为一个数字。
我的目标是删除轴参数并仍然得到相同的结果,并且还要对数组部件标签== k进行索引,我想重写。
你们中有人有这样做的方法吗?
答案 0 :(得分:1)
我猜您正在寻求矢量化解决方案。这是一个matrix-multiplication
-
def matmul(points, label):
k = label.max()+1
mask = label == np.arange(k)[:,None]
out = mask.dot(points)/mask.sum(1,keepdims=True)
return out
这里还有np.add.reduceat
-
def add_reduceat(points, label):
k = label.max()+1
sidx = label.argsort()
ps = points[sidx]
ls = label[sidx]
cutidx = np.flatnonzero(np.r_[True,ls[:-1] != ls[1:],True])
lens = np.diff(cutidx)
out = np.full((k,points.shape[1]),np.nan)
idx_rows = ls[cutidx[:-1]]
mean_vals = np.add.reduceat(ps,cutidx[:-1],axis=0)/lens[:,None]
out[idx_rows] = mean_vals
return out
样品运行-
In [220]: n,d,k = 10000,100,100
...: np.random.seed(0)
...: points = np.random.rand(n,d)
...: label = np.random.randint(0,k,(n))
In [221]: out0 = np.array([points[label==k_i].mean(axis = 0) for k_i in range(k)])
In [222]: np.allclose(matmul(points, label),out0)
Out[222]: True
In [223]: np.allclose(add_reduceat(points, label),out0)
Out[223]: True