在Numpy中矢量化操作

时间:2016-05-15 09:35:58

标签: python numpy vectorization

我试图在不使用循环的情况下在Numpy上执行以下操作:

  • 我有一个尺寸为N * d的矩阵X和一个尺寸为N的矢量y。 y包含从1到K的整数。
  • 我想得到一个大小为K * d的矩阵M,其中M [i,:] = np.mean(X [y == i,:],0)

我可以在不使用循环的情况下实现这一目标吗?

使用循环,它会像这样。

import numpy as np

N=3
d=3 
K=2 

X=np.eye(N)
y=np.random.randint(1,K+1,N)
M=np.zeros((K,d))
for i in np.arange(0,K):
    line=X[y==i+1,:]
    if line.size==0:
        M[i,:]=np.zeros(d)
    else:
        M[i,:]=mp.mean(line,0)

提前谢谢。

2 个答案:

答案 0 :(得分:3)

这解决了这个问题,但是创建了一个中间K×N布尔矩阵,并且不使用内置的均值函数。在某些情况下,这可能导致更差的性能或更差的数值稳定性。我让课程标签的范围从0K-1,而不是1K

# Define constants
K,N,d = 10,1000,3

# Sample data
Y = randint(0,K-1,N) #K-1 to omit one class to test no-examples case
X = randn(N,d)

# Calculate means for each class, vectorized 

# Map samples to labels by taking a logical "outer product"
mark = Y[None,:]==arange(0,K)[:,None] 

# Count number of examples in each class    
count = sum(mark,1)

# Avoid divide by zero if no examples
count += count==0

# Sum within each class and normalize
M = (dot(mark,X).T/count).T

print(M, shape(M), shape(mark))

答案 1 :(得分:3)

代码基本上从X收集特定行并添加它们,我们在np.add.reduceat中内置了NumPy。因此,有了这个焦点,以矢量化方式解决它的步骤可以如下所示 -

# Get sort indices of y
sidx = y.argsort()

# Collect rows off X based on their IDs so that they come in consecutive order
Xr = X[np.arange(N)[sidx]]

# Get unique row IDs, start positions of each unique ID
# and their counts to be used for average calculations
unq,startidx,counts = np.unique((y-1)[sidx],return_index=True,return_counts=True)

# Add rows off Xr based on the slices signified by the start positions
vals = np.true_divide(np.add.reduceat(Xr,startidx,axis=0),counts[:,None])

# Setup output array and set row summed values into it at unique IDs row positions
out = np.zeros((K,d))
out[unq] = vals
相关问题