Question

我正在寻找一个矢量化方法来应用一个函数，将一个二维数组返回到二维数组的每一行并生成一个三维数组。

更具体地说，我有一个函数，它取一个长度为p的向量并返回一个二维数组（m乘n）。以下是我的函数的程式化版本：

import numpy as np  
def test_func(x, m, n):
    # this function is just an example and does not do anything useful.
    # but, the dimensions of input and output is what I want to convey. 
    np.random.seed(x.sum())
    return np.random.randint(5, size=(m, n))

我有一个二维输入数据：

t = 5
p = 6
input_data = np.arange(t*p).reshape(t, p)
input_data
Out[403]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29]])

我想将test_func应用于input_data的每一行。由于test_func返回一个矩阵，我希望创建一个3维（t乘m乘n）数组。我可以使用以下代码生成我想要的结果：

output_data = np.array([test_func(x, m=3, n=2) for x in input_data])
output_data
Out[405]: 
array([[[0, 4],
        [0, 4],
        [3, 3],
        [1, 0]],

       [[1, 0],
        [1, 0],
        [4, 1],
        [2, 4]],

       [[3, 3],
        [3, 0],
        [1, 4],
        [0, 2]],

       [[2, 4],
        [2, 1],
        [3, 2],
        [3, 1]],

       [[3, 4],
        [4, 3],
        [0, 3],
        [3, 0]]])

但是，此代码似乎不是最优化的代码。它有一个明确的降低速度，它使用一个不必要地分配额外内存的中间列表。所以，我喜欢找到一个矢量化解决方案。我最好的猜测是以下代码，但它不起作用。

output = np.apply_along_axis(test_func, m=3, n=2, axis=1, arr=input_data)
Traceback (most recent call last):

  File "<ipython-input-406-5bef44da348f>", line 1, in <module>
    output = np.apply_along_axis(test_func, m=3, n=2, axis=1, arr=input_data)

  File "C:\Anaconda\lib\site-packages\numpy\lib\shape_base.py", line 117, in apply_along_axis
    outarr[tuple(i.tolist())] = res

ValueError: could not broadcast input array from shape (3,2) into shape (3)

请您提出解决此问题的有效方法。

更新

以下是我想要应用的实际功能。它执行多维经典缩放。问题的目的不是优化函数的内部工作，而是找到一个用于矢量化函数apply的泛化方法。但是，本着充分披露的精神，我把实际功能放在这里。请注意，此函数仅在p == m *（m-1）/ 2

时有效

def mds_classical_scaling(v, m, n):    

    # create a symmetric distance matrix from the elements in vector v
    D = np.zeros((m, m))
    D[np.triu_indices(4, k=1)] = v
    D = (D + D.T)

    # Transform the symmetric matrix
    A = -0.5 * (D**2)
    # Create centering matrix    
    H = np.eye(m) - np.ones((m, m))/m
    # Doubly center A and store in B
    B = H*A*H

    # B should be positive definite otherwise the function
    # would not work.
    mu, V = eig(B)

    #index of largest eigen values
    ndx = (-mu).argsort()

    # calculate the point configuration from largest eigen values
    # and corresponding eigen vectors
    Mu1 = diag(mu[ndx][:n])
    V1 = V[:, ndx[:n]]
    X = V1*sqrt(Mu1)    

    return X

与实际功能相比，我从矢量化获得的任何性能提升都可以忽略不计。主要原因是学习：）

Answer 1

ali_m的评论是正确的：为了获得严重的速度提升，您应该更加具体地了解该功能的作用。

话虽如此，如果您仍然希望使用np.apply_along_axis获得（可能）小的速度提升，那么考虑（在重读that function's docstring之后）您可以轻松

包装你的函数以生成1D数组，
使用def test_func_wrapper(*args, **kwargs): return test_func(*args, **kwargs).ravel() output = np.apply_along_axis(test_func_wrapper, m=3, n=2, axis=1, arr=input_data) np.allclose(output.reshape(5,3, -1), output_data) # output: True与该包装器和
重塑生成的数组：
```
{{1}}
```

请注意，这是加速此类循环的通用方法。如果您使用更具体针对实际问题的功能，您可能会获得更好的性能。

NumPy：应用函数的一般向量化方法将矩阵返回到矩阵的每一行

1 个答案: