我在Raspberry Pi 3 B +上安装了miniconda。我正在尝试尽可能优化矩阵乘法。我目前得到以下信息:
In [5]: import numpy as np
In [6]: A = np.random.random([1000, 1000])
In [7]: B = np.random.random([1000, 1000])
In [8]: %timeit A @ B
5.51 s ± 291 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
对我来说看起来真的很慢。
我的配置:
In [9]: np.__config__.show()
atlas_3_10_blas_info:
NOT AVAILABLE
openblas_info:
NOT AVAILABLE
atlas_blas_threads_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
atlas_3_10_blas_threads_info:
NOT AVAILABLE
atlas_3_10_info:
NOT AVAILABLE
lapack_opt_info:
define_macros = [('ATLAS_INFO', '"\\"3.10.3\\""')]
include_dirs = ['/usr/include/atlas']
library_dirs = ['/usr/lib/atlas-base/atlas', '/usr/lib/atlas-base']
language = f77
libraries = ['lapack', 'f77blas', 'cblas', 'atlas', 'f77blas', 'cblas']
atlas_blas_info:
define_macros = [('HAVE_CBLAS', None), ('ATLAS_INFO', '"\\"3.10.3\\""')]
include_dirs = ['/usr/include/atlas']
library_dirs = ['/usr/lib/atlas-base']
language = c
libraries = ['f77blas', 'cblas', 'atlas', 'f77blas', 'cblas']
blas_mkl_info:
NOT AVAILABLE
blas_opt_info:
define_macros = [('HAVE_CBLAS', None), ('ATLAS_INFO', '"\\"3.10.3\\""')]
include_dirs = ['/usr/include/atlas']
library_dirs = ['/usr/lib/atlas-base']
language = c
libraries = ['f77blas', 'cblas', 'atlas', 'f77blas', 'cblas']
atlas_3_10_threads_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_lapack_info:
NOT AVAILABLE
openblas_clapack_info:
NOT AVAILABLE
atlas_threads_info:
NOT AVAILABLE
atlas_info:
define_macros = [('ATLAS_INFO', '"\\"3.10.3\\""')]
include_dirs = ['/usr/include/atlas']
library_dirs = ['/usr/lib/atlas-base/atlas', '/usr/lib/atlas-base']
language = f77
libraries = ['lapack', 'f77blas', 'cblas', 'atlas', 'f77blas', 'cblas']
accelerate_info:
NOT AVAILABLE
我还注意到乘法是在单核上完成的。有办法并行化吗?我尝试了the following question中的技巧,但没有成功。