出于某种原因,numpy的QR分解总是比scipy快,这看起来很奇怪,因为scipy应该包括所有numpy加上更高级的功能。知道为什么吗?
import numpy.linalg as nla
import scipy.linalg as la
A = np.random.randn(4000,4000)
%timeit -n 3 -r 3 Q,R = nla.qr(A)
%timeit -n 3 -r 3 Q,R = la.qr(A)
%timeit -n 3 -r 3 Q,R = nla.qr(A)
%timeit -n 3 -r 3 Q,R = la.qr(A)
3个循环,最佳3:每循环1.17秒
3个循环,最佳3:每循环1.21秒
3个循环,最佳3:每循环1.05秒
3个循环,最佳3:每循环1.21秒
使用mode =“raw”时,差异更明显。
%timeit -n 3 -r 3 Q = nla.qr(A, mode='raw')
%timeit -n 3 -r 3 Q = la.qr(A, mode='raw')
%timeit -n 3 -r 3 Q = nla.qr(A, mode='raw')
%timeit -n 3 -r 3 Q = la.qr(A, mode='raw')
3个循环,每个循环最好为3:440 ms
3个循环,每个循环最好3:612 ms
3个循环,最佳3:每循环436 ms
3个循环,最佳3:每循环758 ms
我正在使用英特尔MKL,我有最新的开发版本
numpy版本:1.11.0.dev0 + 90a1a9f
scipy版本:0.17.0.dev0 + 8003fab
NumPy config:
blas_opt_info:
library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_rt', 'pthread']
mkl_info:
library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_rt', 'pthread']
lapack_opt_info:
library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_rt', 'pthread']
openblas_lapack_info:
NOT AVAILABLE
lapack_mkl_info:
library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_rt', 'pthread']
blas_mkl_info:
library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
libraries = ['mkl_rt', 'pthread']
SciPy config:
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
mkl_info:
libraries = ['mkl_rt', 'pthread']
include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
openblas_lapack_info:
NOT AVAILABLE
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
include_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/include']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
library_dirs = ['/opt/intel/compilers_and_libraries_2016/linux/mkl/lib/intel64']
答案 0 :(得分:0)
非常有趣的帖子!面对类似的问题,我想出了以下不依赖timeit而是探索各种矩阵形状的实验。
它似乎并没有在这个特定环境中证实这一假设,但作为其他实验的起点可能会有兴趣?
"""
Compares Scipy and Numpy QR execution times for various n x m random matrices
"""
import time
from scipy import linalg
import numpy as np
import matplotlib.pyplot as plt
from timeit import timeit
n_feature_grid = [100, 500, 1000, 2000, 4000, 7500, 10000]
max_timeit_nb = 100
m_ratio_grid = [0.05, 0.1, 0.3, 1.0]
exec_times_scipy = np.empty((len(n_feature_grid), len(m_ratio_grid), max_timeit_nb)) * np.nan
exec_times_numpy = np.empty((len(n_feature_grid), len(m_ratio_grid), max_timeit_nb)) * np.nan
rng = np.random.RandomState(69)
for i, n in enumerate(n_feature_grid):
for j, m_ratio in enumerate(m_ratio_grid):
m = int(np.ceil(m_ratio * n))
if m * n < 100 * 100:
timeit_nb = max_timeit_nb
elif m * n < 1000 * 100:
timeit_nb = 50
elif m * n < 3000 * 10000:
timeit_nb = 10
else:
timeit_nb = 5
print("Experiment for matrix %i x %i, repeated %i times" % (n, m, timeit_nb))
for k, nb in enumerate(range(timeit_nb)):
X = rng.randn(n, m)
# numpy
start_time = time.perf_counter()
Qn, Rn = np.linalg.qr(X, mode='reduced')
exec_times_numpy[i, j, k] = time.perf_counter() - start_time
# scipy
start_time = time.perf_counter()
Qs, Rs = linalg.qr(X, mode='economic')
exec_times_scipy[i, j, k] = time.perf_counter() - start_time
np.testing.assert_equal(Qs, Qn)
np.testing.assert_equal(Rs, Rn)
avg_exec_times_scipy = exec_times_scipy.mean(axis=2)
std_exec_times_scipy = exec_times_scipy.std(axis=2)
avg_exec_times_numpy = exec_times_numpy.mean(axis=2)
std_exec_times_numpy = exec_times_numpy.std(axis=2)
avg_exec_times_ratio = np.nanmean((exec_times_scipy / exec_times_numpy), axis=2)
std_exec_times_ratio = np.nanstd((exec_times_scipy / exec_times_numpy), axis=2)
# debug
# np.count_nonzero(np.isnan((exec_times_scipy / exec_times_numpy)), axis=2)
# Plot results
fig, ax = plt.subplots(1, figsize=(16, 12))
for k, m_ratio in enumerate(m_ratio_grid):
ax.errorbar(n_feature_grid, avg_exec_times_ratio[:, k], yerr=std_exec_times_ratio[:, k],
marker='x', linestyle='-', # color='r',
label='scipy/numpy(m=%.2f*n)' % m_ratio)
ax.legend(loc='upper left')
ax.set_ylabel("Ratio scipy/numpy QR exec time.")
ax.set_xlabel("n_features")
ax.set_title("scipy/numpy execution time ratio for QR on a n x m random matrix")
ax.plot([n_feature_grid[0], n_feature_grid[-1]], [1., 1.], linestyle='--', color='k')
plt.show()
生成下图,其中对于小矩阵,该比率显然不会经常超过 1,但对于大型非平方矩阵,该比率往往(虽然不那么明显)大于 1:
矩阵大小和 nb 重复出现在日志中:
Experiment for matrix 100 x 5, repeated 100 times
Experiment for matrix 100 x 10, repeated 100 times
Experiment for matrix 100 x 30, repeated 100 times
Experiment for matrix 100 x 100, repeated 50 times
Experiment for matrix 500 x 25, repeated 50 times
Experiment for matrix 500 x 50, repeated 50 times
Experiment for matrix 500 x 150, repeated 50 times
Experiment for matrix 500 x 500, repeated 10 times
Experiment for matrix 1000 x 50, repeated 50 times
Experiment for matrix 1000 x 100, repeated 10 times
Experiment for matrix 1000 x 300, repeated 10 times
Experiment for matrix 1000 x 1000, repeated 10 times
Experiment for matrix 2000 x 100, repeated 10 times
Experiment for matrix 2000 x 200, repeated 10 times
Experiment for matrix 2000 x 600, repeated 10 times
Experiment for matrix 2000 x 2000, repeated 10 times
Experiment for matrix 4000 x 200, repeated 10 times
Experiment for matrix 4000 x 400, repeated 10 times
Experiment for matrix 4000 x 1200, repeated 10 times
Experiment for matrix 4000 x 4000, repeated 10 times
Experiment for matrix 7500 x 375, repeated 10 times
Experiment for matrix 7500 x 750, repeated 10 times
Experiment for matrix 7500 x 2250, repeated 10 times
Experiment for matrix 7500 x 7500, repeated 5 times
Experiment for matrix 10000 x 500, repeated 10 times
Experiment for matrix 10000 x 1000, repeated 10 times
Experiment for matrix 10000 x 3000, repeated 5 times
Experiment for matrix 10000 x 10000, repeated 5 times
我使用的环境已经很老了:
numpy 1.19.1 py36h5510c5b_0
numpy-base 1.19.1 py36ha3acd2a_0
python 3.6.8 h9f7ef89_7
scipy 1.5.0 py36h9439919_0
(Windows 10 上的 conda 4.8.5)
请注意,我没有使用 'raw'
模式,因为如果我理解正确,它会使生成的矩阵不符合 numpy 标准。您可能希望更改此设置以查看效果。
这项调查来自 scikit-learn 实验,请参阅 this ticket。