我正在尝试对维度为N = 80,000的矩阵NxN的对角线化(所有特征值和特征向量)。我在具有〜3TB RAM的计算机上使用intel MKL 2018,并使用intel LAPACKE_dsyevr例程。我在下面附加一个代码,该代码创建大小为NxN的矩阵,然后将其传递给dsyevr例程以查找特征值。虽然代码适用于N直到〜50000,但对于 N〜80000,并引发内存异常。这不能归因于我的机器的内存,因为代码在因段错误而崩溃之前很高兴在例程dsyevr例程中停留了很长时间。
请注意,由于我的主要代码结构,我确实需要将矩阵声明为std :: vector。由于MKL C例程需要矩阵作为指针,因此使指针指向向量的第一个元素。
代码在这里
// this uses relatively robust representations
#include<iostream>
#include<fstream>
#include<stdlib.h>
#include<stdio.h>
#include<vector>
#include<iterator>
#include<math.h>
#include "mkl_lapacke.h"
#include "mkl.h"
/* Auxiliary routines prototypes */
extern void print_matrix( char* desc, MKL_INT m, MKL_INT n, double* a, MKL_INT lda );
extern void fileprint_matrix( char* desc, MKL_INT m, MKL_INT n, double* a, MKL_INT lda );
/* Main program */
int main() {
unsigned int k;
unsigned int i,j;
MKL_INT N, NSQ, LDA, LDZ, NSELECT, info;
MKL_INT il, iu, m;
double abstol, vl, vu;
N=80000;
NSQ=N*N;
LDA=N;
LDZ=N;
/* allocate space for the arrays */
std::vector<MKL_INT> isuppz(2*N,0);
std::vector<double> w(N,0.0);
std::vector<double> z(NSQ,0.0);
std::vector<double> a(NSQ,0.0);
std::cout<<"size of vector "<<a.size()<<std::endl;
std::cout<<"maximum capacity of vector "<<a.max_size()<<std::endl;
// make the matrix
k=0;
for(size_t pos=0; pos < a.size(); pos++){
i=k/N; j=k%N;
a.at(pos) = 1.0/(i+j+1) + (i+j)*exp(-0.344*(i+j));
k++;
}
double* A = &a[0];
/* print full matrix */
//print_matrix("Full matrix", N, N, A, LDA);
/* Executable statements */
printf( "LAPACKE_dsyevr (column-major, high-level) RRR Example Program Results\n" );
/* Solve eigenproblem */
abstol=-1;
MKL_INT* ISUPPZ = &isuppz[0];
double* W = &w[0];
double* Z = &z[0];
info = LAPACKE_dsyevr( LAPACK_COL_MAJOR, 'V', 'A', 'U', N, A, LDA,
vl, vu, il, iu, abstol, &m, W, Z, LDZ, ISUPPZ );
printf("Back from the routine and working properly. Info = %d. Eval[0]=%e\n",info,W[0]);
//printf("No of eigenvalues = %d\n",m);
/* Check for convergence */
if( info > 0 ) {
printf( "The algorithm failed to compute eigenvalues.\n" );
exit( 1 );
}
isuppz.clear();
w.clear();
z.clear();
a.clear();
exit( 0 );
}
答案 0 :(得分:1)
好的,我的朋友。你能指望什么?让我首先祝贺您解决这个有趣的问题。我很想自己解决。话虽如此:
a
是80000*80000*8 = 47.7GB
。这只是为了存储矩阵。但这还不是全部。您还拥有z
和一些可笑的80000双精度向量。因此,您首先分配96GB。您的进程有多少可用RAM?但这似乎不是您的问题。您的问题是LWORK
并带有W。文档要求LWORK
至少需要1.26*N
:
LWORK is INTEGER
The dimension of the array WORK. LWORK >= max(1,26*N).
For optimal efficiency, LWORK >= (NB+6)*N,
where NB is the max of the blocksize for DSYTRD and DORMTR
returned by ILAENV.
If LWORK = -1, then a workspace query is assumed; the routine
only calculates the optimal size of the WORK array, returns
this value as the first entry of the WORK array, and no error
message related to LWORK is issued by XERBLA.
但是,实际上,您应该做的是通过将算法一次设置为LWORK
来从算法中为-1
进行计算,如上所述。它将为您在LWORK
的第一个单元格中为WORK
提供适当的数字:
类似这样的东西:
LWORK=-1;
std::vector<double> w(1);
// Workspace estimation
info = LAPACKE_dsyevr( LAPACK_COL_MAJOR, 'V', 'A', 'U', N, A, LDA,
vl, vu, il, iu, abstol, &m, W, Z, LDZ, ISUPPZ );
LWORK = std::static_cast<int>(work.front());
try {
work.resize(LWORK);
} catch (const std::exception& e) {
std::cerr << "Failed to allocate work space." << std::endl;
exit 1;
}
// Actual calculation
info = LAPACKE_dsyevr( LAPACK_COL_MAJOR, 'V', 'A', 'U', N, A, LDA,
vl, vu, il, iu, abstol, &m, W, Z, LDZ, ISUPPZ );
完全忘记了我想提到的最重要的事情之一:确保确实需要双精度。 LAPACKE_fsyevr
应该需要RAM并且速度要快两倍。