试图找出一些复数上最快的相关实现 - 数据形状是长度为1e5的1d数组,特征内核长度为~20。 Inner1d不能采用复数,并且fft卷数对于这个内核大小效率不高,所以我测试的两种方法是np.correlate和einsum。
使用浮点数,einsum与使用as_strided的滑动窗口相结合比np.correlate快一点。
data = np.ones(1e5, dtype='float64')
kernel = np.ones(20, dtype='float64')
%timeit xc1 = np.correlate(data, kernel)
100 loops, best of 3: 2.13 ms per loop
%timeit xc2 = np.einsum("ij,j->i", as_strided(data, shape=(data.shape[0]-(kernel.shape[0]-1), kernel.shape[0]), strides=data.strides * 2), kernel)
1000 loops, best of 3: 1.35 ms per loop
但是随着复杂的数字,einsum明显慢于np.correlate ......
data = np.ones(1e5, dtype='complex128')
kernel = np.ones(20, dtype='complex128')
data_conj = np.empty(data.shape[0], dtype='complex128')
data_conj = np.conj(data)
%timeit xc1 = np.correlate(data, kernel)
100 loops, best of 3: 2.21 ms per loop
%timeit xc2 = np.einsum("ij,j->i", as_strided(data_conj, shape=(data_conj.shape[0]-(kernel.shape[0]-1), kernel.shape[0]), strides=data_conj.strides * 2), kernel)
100 loops, best of 3: 5.78 ms per loop
我想我明白为什么einsum比以前服用的时间更长,但np.correlate如何比花车更少花费时间?是否有一个复杂的数学技巧,我可以利用einsum来利用它?
为了它的价值,将einsum相关性分成实部和虚部,速度更快,但仍然比np.correlate慢得多。
%timeit xc3 = np.einsum("ij,j->i", as_strided(data_Re, shape=(data_Re.shape[0]-(kernel_Re.shape[0]-1), kernel_Re.shape[0]), strides=data_Re.strides * 2), kernel_Re) + np.einsum("ij,j->i", as_strided(data_Im, shape=(data_Im.shape[0]-(kernel_Im.shape[0]-1), kernel_Im.shape[0]), strides=data_Im.strides * 2), kernel_Im)
100 loops, best of 3: 4.21 ms per loop