CPU(直接Theano绑定到blas)更慢?

时间:2016-01-15 04:11:53

标签: python theano openblas

我已按照https://www.kaggle.com/c/otto-group-product-classification-challenge/forums/t/13973/a-few-tips-to-install-theano-on-windows-64-bits步骤在我的Windows机器上安装theano,并运行theano.misc.check_blas.test()来测试blas速度,它运行大约10秒。

In [2]: theano.misc.check_blas.test()
Some Theano flags:
    blas.ldflags= -LC:\\openblas -lopenblas
    compiledir= C:\Users\WAWEIMIN\AppData\Local\Theano\compiledir_Windows-7-6.1.
7601-SP1-Intel64_Family_6_Model_61_Stepping_4_GenuineIntel-3.4.3-64
    floatX= float64
    device= cpu
Some OS information:
    sys.platform= win32
    sys.version= 3.4.3 |Anaconda 2.3.0 (64-bit)| (default, Mar  6 2015, 12:06:10
) [MSC v.1600 64 bit (AMD64)]
    sys.prefix= C:\Users\WAWEIMIN\SciSoft\Anaconda
Some environment variables:
    MKL_NUM_THREADS= None
    OMP_NUM_THREADS= None
    GOTO_NUM_THREADS= None

Numpy config: (used when the Theano flag "blas.ldflags" is empty)
lapack_opt_info:
    define_macros = [('SCIPY_MKL_H', None)]
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mk
l_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md',
'libifportmd']
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
blas_opt_info:
    define_macros = [('SCIPY_MKL_H', None)]
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
openblas_lapack_info:
  NOT AVAILABLE
blas_mkl_info:
    define_macros = [('SCIPY_MKL_H', None)]
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
lapack_mkl_info:
    define_macros = [('SCIPY_MKL_H', None)]
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
    libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mk
l_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md',
'libifportmd']
    include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
mkl_info:
    define_macros = [('SCIPY_MKL_H', None)]
    library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
Numpy dot module: numpy.core._dotblas
Numpy location: C:\Users\WAWEIMIN\SciSoft\Anaconda\lib\site-packages\numpy\__ini
t__.py
Numpy version: 1.9.2
Out[2]: (9.959995985031128, 'CPU (with direct Theano binding to blas)')

但是,如果我从.theanorc.txt文件中取出这些行,

[blas]
ldflags=-LC:\\openblas -lopenblas

结果将是(仅显示最后一个输出行):

(2.91823678434, CPU (without direct Theano binding to blas but with numpy/scipy binding to blas)

为什么直接Theano绑定到blas比没有直接绑定要慢得多?我使用的错误是错误吗?

(我按照上面链接的步骤下载并使用openblas - OpenBLAS-v0.2.14-Win64-int32.zip(可以从http://sourceforge.net/projects/openblas/files/v0.2.14/OpenBLAS-v0.2.14-Win64-int32.zip/download下载),并保存到我当地的C:\\ openblas)

我还使用以下脚本进行了测试:

import numpy as np
import time
import theano

print('blas.ldflags=', theano.config.blas.ldflags)

A = np.random.rand(1000, 10000).astype(theano.config.floatX)
B = np.random.rand(10000, 1000).astype(theano.config.floatX)
np_start = time.time()
AB = A.dot(B)
np_end = time.time()
X, Y = theano.tensor.matrices('XY')
mf = theano.function([X, Y], X.dot(Y))
t_start = time.time()
tAB = mf(A, B)
t_end = time.time()
print("NP time: %f[s], theano time: %f[s] (times should be close when run on     CPU!)" % (
np_end - np_start, t_end - t_start))
print("Result difference: %f" % (np.abs(AB - tAB).max(), ))

结果(绑定到openblas)也比NP慢:

blas.ldflags= -LC:\\openblas -lopenblas
NP time: 0.358800[s], theano time: 1.328000[s] (times should be close when     run o
n CPU!)
Result difference: 0.000000

0 个答案:

没有答案