Numpy.dot错误? NaN行为不一致

时间:2014-04-29 20:21:02

标签: python numpy nan blas

我注意到numpy.dot时涉及nan和零的dot行为不一致。

任何人都可以理解它吗?这是一个错误吗?这是否特定于from numpy import * 0*nan, nan*0 => (nan, nan) # makes sense #1 a = array([[0]]) b = array([[nan]]) dot(a, b) => array([[ nan]]) # OK #2 -- adding a value to b. the first value in the result is # not expected to be affected. a = array([[0]]) b = array([[nan, 1]]) dot(a, b) => array([[ 0., 0.]]) # EXPECTED : array([[ nan, 0.]]) # (also happens in 1.6.2 and 1.8.0) # Also, as @Bill noted, a*b works as expected, but not dot(a,b) #3 -- changing a from 0 to 1, the first value in the result is # not expected to be affected. a = array([[1]]) b = array([[nan, 1]]) dot(a, b) => array([[ nan, 1.]]) # OK #4 -- changing shape of a, changes nan in result a = array([[0],[0]]) b = array([[ nan, 1.]]) dot(a, b) => array([[ 0., 0.], [ 0., 0.]]) # EXPECTED : array([[ nan, 0.], [ nan, 0.]]) # (works as expected in 1.6.2 and 1.8.0) 函数?

我使用numpy v1.6.1,64bit,在linux上运行(也在v1.6.2上测试过)。我还在windows 32bit上的v1.8.0上进行了测试(所以我无法判断这些差异是由版本,操作系统还是拱门造成的)。

from numpy.distutils.system_info import get_info; get_info('blas_opt')

案例#4似乎在v1.6.2和v1.8.0中正常工作,但不是案例#2 ......


编辑:@seberg指出这是一个blas问题,所以这里是关于通过运行1.6.1 linux 64bit /usr/lib/python2.7/dist-packages/numpy/distutils/system_info.py:1423: UserWarning: Atlas (http://math-atlas.sourceforge.net/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [atlas]) or by setting the ATLAS environment variable. warnings.warn(AtlasNotFoundError.__doc__) {'libraries': ['blas'], 'library_dirs': ['/usr/lib'], 'language': 'f77', 'define_macros': [('NO_ATLAS_INFO', 1)]} 1.8.0 windows 32bit (anaconda) c:\Anaconda\Lib\site-packages\numpy\distutils\system_info.py:1534: UserWarning: Blas (http://www.netlib.org/blas/) sources not found. Directories to search for the sources can be specified in the numpy/distutils/site.cfg file (section [blas_src]) or by setting the BLAS_SRC environment variable. warnings.warn(BlasSrcNotFoundError.__doc__) {} 找到的blas安装的信息:

{{1}}

(我个人不知道该怎么做)

1 个答案:

答案 0 :(得分:3)

我认为,正如塞伯格所说,这是使用BLAS库的一个问题。如果你看一下如何实现numpy.dot herehere,你会发现对双精度矩阵时矩阵情况的cblas_dgemm()调用。

这个C程序可以重现你的一些例子,在使用" plain"时会产生相同的输出。 BLAS,以及使用ATLAS时的正确答案。

#include <stdio.h>
#include <math.h>

#include "cblas.h"

void onebyone(double a11, double b11, double expectc11)
{
  enum CBLAS_ORDER order=CblasRowMajor;
  enum CBLAS_TRANSPOSE transA=CblasNoTrans;
  enum CBLAS_TRANSPOSE transB=CblasNoTrans;
  int M=1;
  int N=1;
  int K=1;
  double alpha=1.0;
  double A[1]={a11};
  int lda=1;
  double B[1]={b11};
  int ldb=1;
  double beta=0.0;
  double C[1];
  int ldc=1;

  cblas_dgemm(order, transA, transB,
              M, N, K,
              alpha,A,lda,
              B, ldb,
              beta, C, ldc);

  printf("dot([ %.18g],[%.18g]) -> [%.18g]; expected [%.18g]\n",a11,b11,C[0],expectc11);
}

void onebytwo(double a11, double b11, double b12,
              double expectc11, double expectc12)
{
  enum CBLAS_ORDER order=CblasRowMajor;
  enum CBLAS_TRANSPOSE transA=CblasNoTrans;
  enum CBLAS_TRANSPOSE transB=CblasNoTrans;
  int M=1;
  int N=2;
  int K=1;
  double alpha=1.0;
  double A[]={a11};
  int lda=1;
  double B[2]={b11,b12};
  int ldb=2;
  double beta=0.0;
  double C[2];
  int ldc=2;

  cblas_dgemm(order, transA, transB,
              M, N, K,
              alpha,A,lda,
              B, ldb,
              beta, C, ldc);

  printf("dot([ %.18g],[%.18g, %.18g]) -> [%.18g, %.18g]; expected [%.18g, %.18g]\n",
         a11,b11,b12,C[0],C[1],expectc11,expectc12);
}

int
main()
{
  onebyone(0, 0, 0);
  onebyone(2, 3, 6);
  onebyone(NAN, 0, NAN);
  onebyone(0, NAN, NAN);
  onebytwo(0, 0,0, 0,0);
  onebytwo(2, 3,5, 6,10);
  onebytwo(0, NAN,0, NAN,0);
  onebytwo(NAN, 0,0, NAN,NAN);
  return 0;
}

用BLAS输出:

dot([ 0],[0]) -> [0]; expected [0]
dot([ 2],[3]) -> [6]; expected [6]
dot([ nan],[0]) -> [nan]; expected [nan]
dot([ 0],[nan]) -> [0]; expected [nan]
dot([ 0],[0, 0]) -> [0, 0]; expected [0, 0]
dot([ 2],[3, 5]) -> [6, 10]; expected [6, 10]
dot([ 0],[nan, 0]) -> [0, 0]; expected [nan, 0]
dot([ nan],[0, 0]) -> [nan, nan]; expected [nan, nan]

ATLAS输出:

dot([ 0],[0]) -> [0]; expected [0]
dot([ 2],[3]) -> [6]; expected [6]
dot([ nan],[0]) -> [nan]; expected [nan]
dot([ 0],[nan]) -> [nan]; expected [nan]
dot([ 0],[0, 0]) -> [0, 0]; expected [0, 0]
dot([ 2],[3, 5]) -> [6, 10]; expected [6, 10]
dot([ 0],[nan, 0]) -> [nan, 0]; expected [nan, 0]
dot([ nan],[0, 0]) -> [nan, nan]; expected [nan, nan]

当第一个操作数有一个NaN时,BLAS似乎有预期的行为,而当第一个操作数为零而第二个操作数为NaN时错误。

无论如何,我不认为这个bug在Numpy层;它在BLAS中。似乎可以通过使用ATLAS来解决这个问题。

在Ubuntu 14.04上生成,使用Ubuntu提供的gcc,BLAS和ATLAS。