花式索引比numpy.take快得多?

时间:2017-03-11 06:36:55

标签: python arrays performance numpy indexing

我在很多不同的地方看过numpy.take比花式索引更快的替代方法,例如herehere

然而,我根本没有发现这种情况......这是一个例子,当我在一些调试过程中探索我的代码时:

knn_idx
Out[2]: 
array([ 3290,  5847,  7682,  6957, 22660,  5482, 22661, 10965,     7,
        1477,  7681,     3, 17541, 15717,  9139,  1475, 14251,  4400,
        7680,  9140,  4758, 22289,  7679,  8407, 20101, 15718, 15716,
        8405, 15710, 20829, 22662], dtype=uint32)
%timeit X.take(knn_idx, axis=0)
100 loops, best of 3: 3.14 ms per loop
%timeit X[knn_idx]
The slowest run took 60.61 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.48 µs per loop
X.shape
Out[5]: 
(23011, 30)
X.dtype
Out[6]: 
dtype('float64')

这表明花式索引要快得多!使用numpy.arange生成索引我得到了类似的结果:

idx = np.arange(0, len(X), 100)
%timeit X.take(idx, axis=0)
100 loops, best of 3: 3.04 ms per loop
%timeit X[idx]
The slowest run took 9.41 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 20.7 µs per loop

为什么花哨的索引比现在使用numpy.take快得多?我是否会遇到某种边缘情况?

我通过Anaconda使用Python 3.6,如果相关,这是我的numpy信息:

np.__version__
Out[11]: 
'1.11.3'
np.__config__.show()
blas_mkl_info:
    libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
    library_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\include']
blas_opt_info:
    libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
    library_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\include']
openblas_lapack_info:
  NOT AVAILABLE
lapack_mkl_info:
    libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
    library_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\include']
lapack_opt_info:
    libraries = ['mkl_core_dll', 'mkl_intel_lp64_dll', 'mkl_intel_thread_dll']
    library_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['C:/Users/pbreach/Continuum/Anaconda3\\Library\\include']

1 个答案:

答案 0 :(得分:1)

在我的测试take中,速度要快一些;但是由于时间和“缓存”的警告很少,我不会在差异上投入大量资金:

In [192]: timeit X.take(idx2, axis=0).shape
The slowest run took 23.29 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.66 µs per loop
In [193]: timeit X[idx2,:].shape
The slowest run took 16.75 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.58 µs per loop

但您的索引数组是uint32。这对索引起作用了,但是拿了一个转换错误;所以我的idx2astype(int)

使用arange idx,时间为11.5μs,16μs。

注意我的时间是.shape;我不完全确定会有所作为。

我不知道你为什么要花费ms次。感觉更像是时间问题,而不是take中的实际差异。

我不是图书馆,BLAS等会有所作为。底层任务基本相同 - 逐步通过数据缓冲区并复制出选定的字节。农场没有复杂的计算。但我还没有研究过C代码。

Numpy版本'1.12.0',Linux,4gb翻新桌面。