Numpy,Scipy:np.dot在两台机器上使用nans有很大的不同

时间:2018-02-07 20:56:46

标签: python numpy scipy openblas

我有两台机器 - 我自己的和服务器。任务是计算许多矢量和一个给定矢量之间的余弦距离。归一化后,余弦相当于点积:

cosine(u,v) = 1.0 - np.dot(u, v) / (norm(u) * norm(v))

所以我决定将我的矢量标准化并加速3倍。 但是有一点不同。一些向量可以等于np.nan s:

的向量
nullvect = [np.nan]*300

这是我模型的重要部分。假设以下实验比较了来自blas库的scipy.spatial.distance.cosinenp.dot和点积:

import numpy as np

from scipy.spatial.distance import cdist

#import blas version
import scipy
gemv = scipy.linalg.get_blas_funcs("gemv") # Numpy somehow does not want to use optimized version. Let's call BLAS directly
dot = lambda x,y: gemv(1,x,y)

# generate data
vectors = np.random.randn(1000,300)
# normalize and write contiguously
vectors = np.ascontiguousarray([v/np.linalg.norm(v) for v in vectors])
# set some of them to nans
nullvect = [np.nan]*300
# place a lot of nan vectors
vecs = []
for vector in vectors:
    if np.random.randint(2) == 1:
        vecs += [nullvect]
    else:
        vecs += [vector]        
vecs = np.ascontiguousarray(vecs)
# normalize traget vector as well
vec = np.random.randn(300)
vec = vec/np.linalg.norm(vec)

import time

def eval_func():
    # cosine distances
    start = time.time()
    dists = cdist([vec], vecs, metric='cosine').ravel()
    end = time.time()
    print("Overall computation took"+ str((end-start)*10000)+" ms")
    # np.dot distances
    start = time.time()
    dists = 1-vecs.dot(vec.T)
    end = time.time()
    print("Overall computation took"+ str((end-start)*10000)+" ms")
    # blas distances
    start = time.time()
    dists = 1-dot(vecs, vec.T)
    end = time.time()
    print("Overall computation took"+ str((end-start)*10000)+" ms")

eval_func()

在我的本地机器上一切正常 - 点积远快3倍以上:

Overall computation took 5.55992126465 ms
Overall computation took 0.340938568115 ms
Overall computation took 11.9209289551 ms

但是在服务器上出现了问题 - np.dot是完全垃圾!

Overall computation took 5.462169647216797 ms
Overall computation took 81.59875869750977 ms
Overall computation took 15.769004821777344 ms

在运行Ubuntu 16.04的两台计算机上,在通过apt-get安装OpenBlas后,通过pip安装了numpy 1.14。

在两台计算机上图片都是一样的:

>>> np.__config__.show()
lapack_mkl_info:
  NOT AVAILABLE
blas_opt_info:                                                                                                                                                        
    library_dirs = ['/usr/local/lib']                                                                                                                                 
    define_macros = [('HAVE_CBLAS', None)]                                                                                                                            
    language = c                                                                                                                                                      
    libraries = ['openblas', 'openblas']                                                                                                                              
lapack_opt_info:                                                                                                                                                      
    library_dirs = ['/usr/local/lib']                                                                                                                                 
    define_macros = [('HAVE_CBLAS', None)]                                                                                                                            
    language = c                                                                                                                                                      
    libraries = ['openblas', 'openblas']                                                                                                                              
blis_info:                                                                                                                                                            
  NOT AVAILABLE                                                                                                                                                       
openblas_lapack_info:                                                                                                                                                 
    library_dirs = ['/usr/local/lib']                                                                                                                                 
    define_macros = [('HAVE_CBLAS', None)]                                                                                                                            
    language = c
    libraries = ['openblas', 'openblas']
blas_mkl_info:
  NOT AVAILABLE
openblas_info:
    library_dirs = ['/usr/local/lib']
    define_macros = [('HAVE_CBLAS', None)]
    language = c
    libraries = ['openblas', 'openblas']

-

# update-alternatives --config libblas.so.3
There are 2 choices for the alternative libblas.so.3 (providing /usr/lib/libblas.so.3).

  Selection    Path                                 Priority   Status
------------------------------------------------------------
* 0            /usr/lib/openblas-base/libblas.so.3   40        auto mode
  1            /usr/lib/libblas/libblas.so.3         10        manual mode
  2            /usr/lib/openblas-base/libblas.so.3   40        manual mode

服务器运行Python 3.6.2,我的本地机器 - 3.5.2。没有数据中的nans,一切都很好。

服务器是带有8个处理器的Xeon,我的电脑是带有4个内核的i7笔记本电脑(所以服务器肯定应该用并行性来击败我)。

编辑:如果我使用vectors而不是vecs(没有nans),结果也是非常随机的,可以更大,也可以小于余弦 - 可以是1ms或7ms和vise反之亦然。其他两种算法的结果是不变的。

有些没有nans(向量)的运行:

Overall computation took5.519390106201172 ms
Overall computation took1.5783309936523438 ms
Overall computation took8.22305679321289 ms
>>> eval_func()
Overall computation took5.586147308349609 ms
Overall computation took2.3889541625976562 ms
Overall computation took8.22305679321289 ms
>>> eval_func()
Overall computation took5.342960357666016 ms
Overall computation took8.533000946044922 ms
Overall computation took8.122920989990234 ms
>>> eval_func()
Overall computation took5.633831024169922 ms
Overall computation took1.8835067749023438 ms
Overall computation took8.115768432617188 ms
>>> eval_func()
Overall computation took5.548000335693359 ms
Overall computation took1.7952919006347656 ms
Overall computation took8.14199447631836 ms
>>> eval_func()
Overall computation took5.550384521484375 ms
Overall computation took2.02178955078125 ms
Overall computation took8.759498596191406 ms
>>> eval_func()
Overall computation took5.5217742919921875 ms
Overall computation took13.997554779052734 ms
Overall computation took8.320808410644531 ms
>>> eval_func()
Overall computation took5.490779876708984 ms
Overall computation took2.6559829711914062 ms
Overall computation took9.52005386352539 ms
>>> eval_func()
Overall computation took5.395412445068359 ms
Overall computation took2.0003318786621094 ms
Overall computation took8.37564468383789 ms
>>> eval_func()
Overall computation took5.440711975097656 ms
Overall computation took2.224445343017578 ms
Overall computation took8.237361907958984 ms

有些运行nans(vecs):

>>> eval_func()
Overall computation took5.4454803466796875 ms
Overall computation took26.26657485961914 ms
Overall computation took7.8868865966796875 ms
>>> eval_func()
Overall computation took5.366802215576172 ms
Overall computation took19.97232437133789 ms
Overall computation took7.729530334472656 ms
>>> eval_func()
Overall computation took5.433559417724609 ms
Overall computation took19.736289978027344 ms
Overall computation took7.9345703125 ms
>>> eval_func()
Overall computation took5.447864532470703 ms
Overall computation took18.61095428466797 ms
Overall computation took8.14676284790039 ms
>>> eval_func()
Overall computation took5.567073822021484 ms
Overall computation took1.926422119140625 ms
Overall computation took7.729530334472656 ms
>>> eval_func()
Overall computation took5.640983581542969 ms
Overall computation took20.401477813720703 ms
Overall computation took7.8868865966796875 ms

编辑2:

设置

import os
os.environ['OMP_NUM_THREADS']='1'

没有帮助

也许我可以切换到另一个图书馆?

EDIT3:

在两台机器上

>>> import inspect
>>> inspect.getmodule(np.dot)
<module 'numpy.core.multiarray' from '/usr/local/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so'>

但是,看起来像this can be different

编辑4:最大的惊喜:np.inner()似乎工作正常(它等同于np.dot进行矩阵向量乘法,但速度稍慢一点):

def eval_func():
    # cosine distances
    start = time.time()
    dists = cdist([vec], vecs, metric='cosine').ravel()
    end = time.time()
    print("Overall computation took"+ str((end-start)*10000)+" ms")
    # np.dot distances
    start = time.time()
    dists = 1-vecs.dot(vec.T)
    end = time.time()
    print("Overall computation took"+ str((end-start)*10000)+" ms")
    # blas distances
    start = time.time()
    dists = 1-dot(vecs, vec.T)
    end = time.time()
    print("Overall computation took"+ str((end-start)*10000)+" ms")


    start = time.time()
    dists = 1-np.inner(vecs, vec.T)
    end = time.time()
    print("Overall computation took"+ str((end-start)*10000)+" ms")

eval_func()


>>> eval_func()
Overall computation took5.500316619873047 ms
Overall computation took37.69874572753906 ms
Overall computation took8.03232192993164 ms
Overall computation took0.6771087646484375 ms
>>> eval_func()
Overall computation took5.517005920410156 ms
Overall computation took1.811981201171875 ms
Overall computation took7.5435638427734375 ms
Overall computation took0.6580352783203125 ms
>>> eval_func()
Overall computation took5.688667297363281 ms
Overall computation took38.02061080932617 ms
Overall computation took7.672309875488281 ms
Overall computation took0.5245208740234375 ms
>>> eval_func()
Overall computation took5.614757537841797 ms
Overall computation took38.6500358581543 ms
Overall computation took8.053779602050781 ms
Overall computation took0.6699562072753906 ms
>>> eval_func()
Overall computation took5.481243133544922 ms
Overall computation took39.615631103515625 ms
Overall computation took7.724761962890625 ms
Overall computation took0.5125999450683594 ms
>>> eval_func()
Overall computation took5.562305450439453 ms
Overall computation took2.0051002502441406 ms
Overall computation took7.450580596923828 ms
Overall computation took0.5030632019042969 ms
>>> eval_func()
Overall computation took5.252361297607422 ms
Overall computation took3.883838653564453 ms
Overall computation took7.433891296386719 ms
Overall computation took0.705718994140625 ms

EDIT5:所以我想在我的程序中用cdist替换np.inner,然后放慢速度!以下是cdist(first)和np.inner(second)之间ms的时间结果比较。最后一个元组是矢量数组的形状:

2018-02-07 23:12:37,609 0.6237030029296875 20.6756591796875 (1381, 300)
2018-02-07 23:12:37,612 0.6270408630371094 0.7233619689941406 (1381, 300)
2018-02-07 23:12:37,614 2.070188522338867 4.981040954589844 (2879, 300)
2018-02-07 23:12:37,630 19.373416900634766 28.1679630279541 (2387, 300)
2018-02-07 23:12:37,641 0.6058216094970703 28.65886688232422 (1007, 300)
2018-02-07 23:12:37,642 1.2252330780029297 12.749433517456055 (2879, 300)
2018-02-07 23:12:37,649 29.929637908935547 76.58910751342773 (9151, 300)
2018-02-07 23:12:37,649 1.0759830474853516 15.654802322387695 (2387, 300)
2018-02-07 23:12:37,653 0.5512237548828125 6.318569183349609 (1007, 300)
2018-02-07 23:12:37,664 0.4935264587402344 6.627321243286133 (1007, 300)
2018-02-07 23:12:37,670 1.275777816772461 11.78598403930664 (2879, 300)
2018-02-07 23:12:37,679 0.47206878662109375 10.010242462158203 (1007, 300)
2018-02-07 23:12:37,679 0.46634674072265625 16.28565788269043 (922, 300)
2018-02-07 23:12:37,683 0.7443428039550781 29.844284057617188 (1381, 300)
2018-02-07 23:12:37,686 26.117801666259766 36.93413734436035 (9929, 300)
2018-02-07 23:12:37,693 1.2125968933105469 13.280630111694336 (2879, 300)
2018-02-07 23:12:37,698 0.4277229309082031 6.974458694458008 (922, 300)
2018-02-07 23:12:37,698 0.5869865417480469 8.502006530761719 (1381, 300)
2018-02-07 23:12:37,708 0.48828125 22.032976150512695 (1007, 300)
2018-02-07 23:12:37,710 0.4360675811767578 6.644725799560547 (922, 300)
2018-02-07 23:12:37,713 0.6110668182373047 8.195638656616211 (1381, 300)
2018-02-07 23:12:37,719 1.1529922485351562 12.442350387573242 (2879, 300)
2018-02-07 23:12:37,720 0.4718303680419922 5.584001541137695 (1007, 300)
2018-02-07 23:12:37,725 0.46443939208984375 13.801336288452148 (922, 300)
2018-02-07 23:12:37,729 0.5846023559570312 2.321958541870117 (1381, 300)
2018-02-07 23:12:37,737 0.4661083221435547 4.40669059753418 (922, 300)
2018-02-07 23:12:37,746 0.09751319885253906 0.033855438232421875 (4, 300)
2018-02-07 23:12:37,749 0.4220008850097656 6.46662712097168 (922, 300)
2018-02-07 23:12:37,750 0.07700920104980469 0.030040740966796875 (4, 300)
2018-02-07 23:12:37,751 0.16117095947265625 26.17502212524414 (62, 300)
2018-02-07 23:12:37,751 0.14591217041015625 19.942522048950195 (37, 300)
2018-02-07 23:12:37,752 0.0743865966796875 0.029325485229492188 (4, 300)
2018-02-07 23:12:37,753 0.1418590545654297 1.1310577392578125 (37, 300)
2018-02-07 23:12:37,753 0.07486343383789062 0.019788742065429688 (4, 300)
2018-02-07 23:12:37,754 0.10466575622558594 0.9295940399169922 (37, 300)
2018-02-07 23:12:37,755 0.07486343383789062 0.029802322387695312 (4, 300)
2018-02-07 23:12:37,755 0.10275840759277344 0.8199214935302734 (37, 300)
2018-02-07 23:12:37,757 0.10943412780761719 0.7965564727783203 (37, 300)
2018-02-07 23:12:37,758 0.11110305786132812 0.8292198181152344 (37, 300)
2018-02-07 23:12:37,758 0.07224082946777344 0.029325485229492188 (4, 300)
2018-02-07 23:12:37,765 8.567333221435547 22.03059196472168 (9929, 300)
2018-02-07 23:12:37,771 0.12874603271484375 17.62080192565918 (62, 300)
2018-02-07 23:12:37,776 0.12159347534179688 2.850055694580078 (62, 300)
2018-02-07 23:12:37,782 4.766225814819336 44.742584228515625 (9151, 300)
2018-02-07 23:12:37,783 0.14066696166992188 4.610538482666016 (62, 300)
2018-02-07 23:12:37,786 0.12326240539550781 0.9648799896240234 (62, 300)
2018-02-07 23:12:37,788 0.12063980102539062 0.9310245513916016 (62, 300)
2018-02-07 23:12:37,814 0.26988983154296875 26.521682739257812 (428, 300)

我只是不明白发生了什么......

EDIT6:使用atlas库重新安装了docker镜像,但是新安装的numpy仍然在np.__config__.show()中说它使用了openblas。这在系统中甚至不存在。

EDIT7:重建numpy使用地图集。现在np.dotnp.inner的结果最终稳定,但只比scipy cdist好两倍。所以它似乎是亚马逊ec2服务器上的openblas问题。结果完全是随机的。

我已经检查过码头工人以外的情况,它是一样的。

0 个答案:

没有答案