AMD的ACML BLAS / LAPACK库存在奇怪的性能问题

时间:2013-09-24 16:23:54

标签: c performance linear-algebra lapack blas

我几天前在AMD开发者论坛上问过这个问题,但还没有得到答案。也许有人在这里有一些见解 http://devgurus.amd.com/thread/167492

我正在Ubuntu 12.04上的Opteron 6348处理器上运行ACML版本5.3.1,libacml_mp和gfortran_fma4。

如果我首先调用dpotrf(cholesky分解),那么调用dsyev(特征分解)的性能会急剧下降(10倍)。对我来说,为什么会发生这种情况毫无意义。也许我需要清除某种缓存或类似的东西。

这是一个简单的C程序,可以重现问题。

#include <stdio.h>
#include <stdlib.h>
#include <acml.h>
#include <time.h>

int main(void) {
  double * x = malloc(1000000 * sizeof(double));
  double * y = malloc(1000000 * sizeof(double));
  double * eig0 = malloc(1000000 * sizeof(double));
  double * eig1 = malloc(1000000 * sizeof(double));
  double * eigw = malloc(1000 * sizeof(double));
  double * chol = malloc(1000000 * sizeof(double));

  clock_t t0,t1;
  int info;
  int i;

  // generate a random matrix
  for(i = 0; i<1000000; ++i){
    x[i] = rand() / (double) RAND_MAX;
  }

  // compute y = xx^T so that y is symmetric positive definite
  dgemm('N','T',1000,1000,1000,1,x,1000,x,1000,0,y,1000);

  // make a copy of y for cholesky and eigen decompositions
  for(i = 0; i<1000000; ++i){
    chol[i] = y[i];
    eig0[i] = y[i];
    eig1[i] = y[i];
  }

  // first eigenvalue test
  t0 = clock();
  dsyev('V','U',1000,eig0,1000,eigw,&info);
  t1 = clock();
  printf("Eigen decomposition time: %d\n", (t1-t0)/1000);

  // cholesky
  dpotrf('U',1000,chol,1000,&info);

  // second eigenvalue test, after cholesky
  t0 = clock();
  dsyev('V','U',1000,eig1,1000,eigw,&info);
  t1 = clock();
  printf("Eigen decomposition time: %d\n", (t1-t0)/1000);
}

这是输出:

Eigen decomposition time: 8120
Eigen decomposition time: 95140

如果我注释掉dpotrf系列,那么它可以正常工作:

Eigen decomposition time: 8150
Eigen decomposition time: 8210

0 个答案:

没有答案