我正在为并行计算类编写MPI程序。我有代码工作,它输出正确的结果,但当我尝试使用多个进程调用MPI_Finalize时,我得到一个Buss错误。我通过Eclipse中的PTP环境在OS X上运行它。错误如下:
[Fruity:49034] *** Process received signal ***
[Fruity:49034] Signal: Bus error (10)
[Fruity:49034] Signal code: (2)
[Fruity:49034] Failing at address: 0x100336d7e
[Fruity:49034] [ 0] 2 libSystem.B.dylib 0x00007fff865cc1ba _sigtramp + 26
[Fruity:49034] [ 1] 3 ??? 0x0000000000000000 0x0 + 0
[Fruity:49034] [ 2] 4 libSystem.B.dylib 0x00007fff86570c27 tiny_malloc_from_free_list + 1196
[Fruity:49034] [ 3] 5 libSystem.B.dylib 0x00007fff8656fabd szone_malloc_should_clear + 242
[Fruity:49034] [ 4] 6 libopen-pal.0.dylib 0x0000000100187b9f opal_memory_base_open + 527
[Fruity:49034] [ 5] 7 libSystem.B.dylib 0x00007fff8656f98a malloc_zone_malloc + 82
[Fruity:49034] [ 6] 8 libSystem.B.dylib 0x00007fff8656dc88 malloc + 44
[Fruity:49034] [ 7] 9 libSystem.B.dylib 0x00007fff8657846d asprintf + 157
[Fruity:49034] [ 8] 10 libopen-rte.0.dylib 0x000000010013aebc orte_schema_base_get_job_segment_name + 108
[Fruity:49034] [ 9] 11 libopen-rte.0.dylib 0x000000010013d899 orte_smr_base_set_proc_state + 57
[Fruity:49034] [10] 12 libmpi.0.dylib 0x0000000100063758 ompi_mpi_finalize + 312
[Fruity:49034] [11] 13 Assignment31 0x0000000100002642 main + 491
[Fruity:49034] [12] 14 Assignment31 0x0000000100001688 start + 52
[Fruity:49034] *** End of error message ***
mpirun noticed that job rank 0 with PID 49033 on node Fruity.local exited on signal 15 (Terminated).
1 additional process aborted (not shown)
这是我的代码的主要功能。我确信这里有一些糟糕的C ++实践(我多年没有使用它并且自学成才)但它确实输出了正确的值。如果我需要发布文件的其余部分,我可以这样做。如果有明显的错误,我只是不想让这个问题成为一个大问题。
int main(int argc, char* argv[]){
/* start up MPI */
MPI_Init(&argc, &argv);
/* find out process rank */
MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
/* find out number of processes */
MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
/* find which nodes this processor is responsible for */
findStartAndEndPositions();
/*Intitialize the array to its starting values. */
initializeArray();
/*Find the elements that are dependent on outside processors */
findDependentElements();
MPI_Barrier(MPI_COMM_WORLD);
if(myRank == 0){
startTime = MPI_Wtime();
printArray();
}
int iter;
for(iter = 0; iter < NUM_ITERATIONS; iter++){
doCommunication();
MPI_Barrier(MPI_COMM_WORLD);
doIteration();
}
double check = computeCheck();
double receive = 0;
if(myRank == 0){
MPI_Reduce(&check, &receive, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
std::cout << "The total time was: " << MPI_Wtime() - startTime << " \n";
std::cout << "The checksum was: " << receive << " \n";
printArray();
}
else{
MPI_Reduce(&check, &receive, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
}
/* shut down MPI */
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
编辑:我已经将问题缩小到我的doIteration函数中的某个位置。我只在调用该函数时才会收到错误,并且只有当我有多个进程在运行时才会出错。这是我的doIteration函数。它应该用不自身及其四个邻居的最大值替换不在矩阵边缘的矩阵的每个值。一旦整个更新完成(因此使用数组temp),应该更新这些值。
void doIteration(){
int pos;
double* temp = new double[end - start + 1];
for(pos = start; pos <= end; pos++){
int i, row, col;
double max;
convertToRowCol(pos, &row, &col);
if(isEdgeNode(row, col))
continue;
int dependents[4];
getDependentsOfPosition(pos, dependents);
max = a[row][col];
for(i = 0; i < 4; i++){
if(isInvalidPos(dependents[i]))
continue;
int dRow, dCol;
convertToRowCol(dependents[i], &dRow, &dCol);
max = std::max(max, a[dRow][dCol]);
}
temp[pos] = max;
}
for(pos = start; pos <= end; pos++){
int row, col;
convertToRowCol(pos, &row, &col);
if(! isEdgeNode(row, col))
a[row][col] = temp[pos];
}
delete [] temp;
}
答案 0 :(得分:0)
我不确定这是否是原因,但MPI_Reduce通常是一行,没有必要写两行。试试看它是否有帮助。
MPI_Reduce(&check, &receive, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
if(myRank == 0){
std::cout << "The total time was: " << MPI_Wtime() - startTime << " \n";
std::cout << "The checksum was: " << receive << " \n";
printArray();
}