在我为MPI编写C ++包装器的过程中,我遇到MPI_Test()
中的分段错误,其原因是我无法弄清楚。
以下代码是一个最小的崩溃示例,可以使用mpic++ -std=c++11 -g -o test test.cpp && ./test
编译和运行:
#include <stdlib.h>
#include <stdio.h>
#include <memory>
#include <mpi.h>
class Environment {
public:
static Environment &getInstance() {
static Environment instance;
return instance;
}
static bool initialized() {
int ini;
MPI_Initialized(&ini);
return ini != 0;
}
static bool finalized() {
int fin;
MPI_Finalized(&fin);
return fin != 0;
}
private:
Environment() {
if(!initialized()) {
MPI_Init(NULL, NULL);
_initialized = true;
}
}
~Environment() {
if(!_initialized)
return;
if(finalized())
return;
MPI_Finalize();
}
bool _initialized{false};
public:
Environment(Environment const &) = delete;
void operator=(Environment const &) = delete;
};
class Status {
private:
std::shared_ptr<MPI_Status> _mpi_status;
MPI_Datatype _mpi_type;
};
class Request {
private:
std::shared_ptr<MPI_Request> _request;
int _flag;
Status _status;
};
int main() {
auto &m = Environment::getInstance();
MPI_Request r;
MPI_Status s;
int a;
MPI_Test(&r, &a, &s);
Request r2;
printf("b\n");
}
基本上,Environment
类是围绕MPI_Init
和MPI_Finalize
的单例包装器。当程序退出时,MPI将被最终确定,并且第一次实例化类时,将调用MPI_Init
。然后我在main()
函数中做一些MPI,包含一些其他简单的包装器对象。
上面的代码崩溃了(在我的机器上,OpenMPI和Linux)。但是,它适用于我
Request
或Status
的任何私人成员(甚至int _flag;
)printf("b\n");
auto &m = Environment::getInstance();
替换为MPI_Init()
。这些点之间似乎没有联系,我也不知道在哪里寻找分段错误。
堆栈跟踪是:
[pc13090:05978] *** Process received signal ***
[pc13090:05978] Signal: Segmentation fault (11)
[pc13090:05978] Signal code: Address not mapped (1)
[pc13090:05978] Failing at address: 0x61
[pc13090:05978] [ 0] /usr/lib/libpthread.so.0(+0x11dd0)[0x7fa9cf818dd0]
[pc13090:05978] [ 1] /usr/lib/openmpi/libmpi.so.40(ompi_request_default_test+0x16)[0x7fa9d0357326]
[pc13090:05978] [ 2] /usr/lib/openmpi/libmpi.so.40(MPI_Test+0x31)[0x7fa9d03970b1]
[pc13090:05978] [ 3] ./test(+0xb7ae)[0x55713d1aa7ae]
[pc13090:05978] [ 4] /usr/lib/libc.so.6(__libc_start_main+0xea)[0x7fa9cf470f4a]
[pc13090:05978] [ 5] ./test(+0xb5ea)[0x55713d1aa5ea]
[pc13090:05978] *** End of error message ***
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node pc13090 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------