我试图在不使用循环的情况下在Numpy上执行以下操作:
我可以在不使用循环的情况下实现这一目标吗?
使用循环,它会像这样。
import numpy as np
N=3
d=3
K=2
X=np.eye(N)
y=np.random.randint(1,K+1,N)
M=np.zeros((K,d))
for i in np.arange(0,K):
line=X[y==i+1,:]
if line.size==0:
M[i,:]=np.zeros(d)
else:
M[i,:]=mp.mean(line,0)
提前谢谢。
答案 0 :(得分:3)
这解决了这个问题,但是创建了一个中间K×N布尔矩阵,并且不使用内置的均值函数。在某些情况下,这可能导致更差的性能或更差的数值稳定性。我让课程标签的范围从0
到K-1
,而不是1
到K
。
# Define constants
K,N,d = 10,1000,3
# Sample data
Y = randint(0,K-1,N) #K-1 to omit one class to test no-examples case
X = randn(N,d)
# Calculate means for each class, vectorized
# Map samples to labels by taking a logical "outer product"
mark = Y[None,:]==arange(0,K)[:,None]
# Count number of examples in each class
count = sum(mark,1)
# Avoid divide by zero if no examples
count += count==0
# Sum within each class and normalize
M = (dot(mark,X).T/count).T
print(M, shape(M), shape(mark))
答案 1 :(得分:3)
代码基本上从X收集特定行并添加它们,我们在np.add.reduceat
中内置了NumPy。因此,有了这个焦点,以矢量化方式解决它的步骤可以如下所示 -
# Get sort indices of y
sidx = y.argsort()
# Collect rows off X based on their IDs so that they come in consecutive order
Xr = X[np.arange(N)[sidx]]
# Get unique row IDs, start positions of each unique ID
# and their counts to be used for average calculations
unq,startidx,counts = np.unique((y-1)[sidx],return_index=True,return_counts=True)
# Add rows off Xr based on the slices signified by the start positions
vals = np.true_divide(np.add.reduceat(Xr,startidx,axis=0),counts[:,None])
# Setup output array and set row summed values into it at unique IDs row positions
out = np.zeros((K,d))
out[unq] = vals