我有一些简单的代码可以用GNU编译器很好地编译。然后我切换到PGI编译器。但是该程序将失败。我正在使用具有Xeon E5 CPU 16处理器和两张GPU卡,一张泰坦和一张1080的台式机进行编译。
我在以下简单的hello世界中进行了测试,
npm run eject
错误如下
#include <mpi.h>
#include <iostream>
using namespace std;
int main(int argc, char **argv){
int procid, numprocs;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &procid);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
cout << "hello world"<< endl;
MPI_Finalize();
return 0;
}
以下是mpic ++的结果
[WorkStation:14395] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 388
[WorkStation:14395] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 166
--------------------------------------------------------------------------
Sorry! You were supposed to get help about:
orte_init:startup:internal-failure
But I couldn't open the help file:
/proj/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/share/openmpi/help-orte-runtime: No such file or directory. Sorry!
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Sorry! You were supposed to get help about:
mpi_init:startup:internal-failure
But I couldn't open the help file:
/proj/pgi/linux86-64-llvm/2019/mpi/openmpi-3.1.3/share/openmpi/help-mpi-runtime.txt: No such file or directory. Sorry!
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[WorkStation:14395] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
我在设置相同的笔记本电脑上尝试了相同的操作。编译良好,没有问题。我开始怀疑是多GPU引起了问题。由于每个GPU都有不同的PGI目标,因此我尝试了-ta = tesla:cc70和-ta = tesla:cc60。都不行。
我不知道如何调试它,如果需要更多信息,可以添加它。