Question

np.apply_along_axis（）函数似乎非常慢（15分钟后没有输出）。有没有一种快速的方法在长阵列上执行此功能而无需并行化操作？我特别谈论有数百万个元素的数组。

这是我想要做的一个例子。请忽略my_func的简单定义，目标不是将数组乘以55（当然这可以在适当的位置完成），而是一个例子。在实践中，my_func稍微复杂一些，需要额外的参数，因此a的每个元素都会被不同地修改，即不仅仅乘以55.

>>> def my_func(a):
...     return a[0]*55
>>> a = np.ones((200000000,1))
>>> np.apply_along_axis(my_func, 1, a)

编辑：

a = np.ones((20,1))

def my_func(a, i,j):
...     b = np.zeros((2,2))
...     b[0,0] = a[i]
...     b[1,0] = a[i]
...     b[0,1] = a[i]
...     b[1,1] = a[j]
...     return  linalg.eigh(b)


>>> my_func(a,1,1)
(array([ 0.,  2.]), array([[-0.70710678,  0.70710678],
   [ 0.70710678,  0.70710678]]))

Answer 1

np.apply_along_axis不是为了速度。

没有办法将纯Python 函数应用于Numpy数组的每个元素而不会多次调用它，缺少AST重写...

幸运的是，有解决方案：

<强>向量化

虽然这通常很难，但通常是简单的解决方案。找到某种方式来表达您的计算方式，以便对元素进行概括，这样您就可以立即处理整个矩阵。这将导致循环从Python中提升并进入大量优化的C和Fortran例程。
JITing ： Numba 和长尾小鹦鹉，在较小程度上 PyPy NumPyPy

Numba和Parakeet都处理Numpy数据结构上的JITing循环，因此如果你将循环内联到一个函数（这可以是一个包装函数），你可以获得几乎的速度提升 - 自由。这取决于所使用的数据结构。

符号评估员，例如 Theano 和 numexpr

这些允许您使用嵌入式语言来表达计算，这甚至可以比矢量化版本快得多。

Cython 和 C扩展程序

如果其他所有东西都丢失了，你总是可以手动挖掘到C.Cython隐藏了很多复杂性并且有很多可爱的魔法，所以它并不总是那么糟糕（尽管它有助于你知道你是什么这样做）。

你走了。

这是我的测试“环境”（你应该真的提供了这个：P）：

import itertools import numpy a = numpy.arange(200).reshape((200,1)) ** 2 def my_func(a, i,j): b = numpy.zeros((2,2)) b[0,0] = a[i] b[1,0] = a[i] b[0,1] = a[i] b[1,1] = a[j] return numpy.linalg.eigh(b) eigvals = {} eigvecs = {} for i, j in itertools.combinations(range(a.size), 2): eigvals[i, j], eigvecs[i, j] = my_func(a,i,j)

现在，获取所有排列而不是组合要容易得多，因为你可以这样做：

# All *permutations*, not combinations indexes = numpy.mgrid[:a.size, :a.size]

这可能看起来很浪费，但只有两倍的排列，所以这不是什么大问题。

因此我们希望使用这些索引来获取相关元素：

# Remove the extra dimension; it's not wanted here! subs = a[:,0][indexes]

然后我们可以制作我们的矩阵：

target = numpy.array([ [subs[0], subs[0]], [subs[0], subs[1]] ])

我们需要矩阵在 last 两个维度中：

target.shape #>>> (2, 2, 200, 200) target = numpy.swapaxes(target, 0, 2) target = numpy.swapaxes(target, 1, 3) target.shape #>>> (200, 200, 2, 2)

我们可以检查它是否有效：

target[10, 20] #>>> array([[100, 100], #>>> [100, 400]])

耶！

然后我们只运行numpy.linalg.eigh：

values, vectors = numpy.linalg.eigh(target)

看，它有效！

values[10, 20] #>>> array([ 69.72243623, 430.27756377]) eigvals[10, 20] #>>> array([ 69.72243623, 430.27756377])

那么我想你可能想要连接这些：

numpy.concatenate([values[row, row+1:] for row in range(len(values))]) #>>> array([[ 0.00000000e+00, 1.00000000e+00], #>>> [ 0.00000000e+00, 4.00000000e+00], #>>> [ 0.00000000e+00, 9.00000000e+00], #>>> ..., #>>> [ 1.96997462e+02, 7.78160025e+04], #>>> [ 3.93979696e+02, 7.80160203e+04], #>>> [ 1.97997475e+02, 7.86070025e+04]]) numpy.concatenate([vectors[row, row+1:] for row in range(len(vectors))]) #>>> array([[[ 1. , 0. ], #>>> [ 0. , 1. ]], #>>> #>>> [[ 1. , 0. ], #>>> [ 0. , 1. ]], #>>> #>>> [[ 1. , 0. ], #>>> [ 0. , 1. ]], #>>> #>>> ..., #>>> [[-0.70890372, 0.70530527], #>>> [ 0.70530527, 0.70890372]], #>>> #>>> [[-0.71070503, 0.70349013], #>>> [ 0.70349013, 0.71070503]], #>>> #>>> [[-0.70889463, 0.7053144 ], #>>> [ 0.7053144 , 0.70889463]]])

也可以在numpy.mgrid之后立即执行此连接循环，以减少工作量：

# All *permutations*, not combinations indexes = numpy.mgrid[:a.size, :a.size] # Convert to all *combinations* and reduce the dimensionality indexes = numpy.concatenate([indexes[:, row, row+1:] for row in range(indexes.shape[1])], axis=1) # Remove the extra dimension; it's not wanted here! subs = a[:,0][indexes] target = numpy.array([ [subs[0], subs[0]], [subs[0], subs[1]] ]) target = numpy.rollaxis(target, 2) values, vectors = numpy.linalg.eigh(target)

是的，最后一个样本就是你所需要的。

Answer 2

因此a的每个元素都有不同的修改

如果数组元素之间没有连接，那么Veedracs的回答总结了典型的策略。然而，通常可以找到矢量化方案，其大规模地加速计算。如果您提供函数本身的相关代码片段，我们可以为您提供更有帮助的答案。

修改

以下代码说明了如何对样本函数进行矢量化。虽然它不完整（块矩阵和特征值检索），但它应该为您提供一些基本的想法，如何做到这一点。查看函数中的每个矩阵和子矩阵，看看如何设置这样的计算。另外，我使用了密集矩阵，当使用a中的数百万个元素和大量索引对时，这些矩阵很可能不适合内存。但计算过程中的大多数矩阵都很稀疏。因此，您始终可以将代码转换为使用稀疏矩阵。该函数现在采用向量a和索引pairs的向量。

import numpy as np def my_func(a,pairs): #define mask matrix g=np.zeros((4,2)) g[:3,0]=1 g[3,1]=1 # k is the number of index pairs which need calculation k=pairs.shape[0] h=np.kron(np.eye(k),g) b=np.dot(h,a[pairs.ravel()[:2*k]]) # this matrix product generates your matrix b b.shape=-1,2 out = np.zeros((2*k,2*k)) # pre allocate memory of the block diagonal matrix # make block diagonal matrix for i in xrange(k): out[i*2:(i+1)*2, i*2:(i+1)*2] = b[i*2:(i+1)*2,:] res = np.linalg.eigh(out) # the eigenvalues of each 2by2 matrix are the same as the ones of one large block diagonal matrix # unfortunately eigh sorts the eigenvalues # to retrieve the correct pairs of eigenvalues # for each submatrix b, one has to inspect res[1] and pick # corresponding eigenvalues # I leave that for you, remember out=res[1] diag(res[0]) res[1].T return res #vektor a a=np.arange(20) #define index pairs for calculation pairs=np.asarray([[1,3],[2,7],[1,7],[2,3]]) print my_func(a,pairs)

numpy np.apply_along_axis功能加快了吗？

2 个答案: