我正在编写一个程序,它将匹配一个程序段(一组4个双精度值,在一定的绝对值范围内)与另一个程序段匹配。
基本上,我将在main中调用该函数。
矩阵有4399行和500列。我正在尝试使用OpenMp来加速任务,但我的代码似乎在最内层循环(实际创建块发生create_Block(rrr[k], i);
)中具有竞争条件。
可以忽略所有功能细节,因为它们在串行版本中运行良好。这里唯一的重点是OpenMP衍生产品。
int main(void) {
readKey("keys.txt");
double** jz = readMatrix("data.txt");
int j = 0;
int i = 0;
int k = 0;
#pragma omp parallel for firstprivate(i) shared(Big_Block,NUM_OF_BLOCK,SIZE_OF_COLLECTION,b)
for (i = 0; i < 50; i++) {
printf("THIS IS COLUMN %d\n", i);
double*c = readCol(jz, i, 4400);
#pragma omp parallel for firstprivate(j) shared(i,Big_Block,NUM_OF_BLOCK,SIZE_OF_COLLECTION,b)
for (j=0; j < 4400; j++) {
// printf("This is fixed row %d from column %d !!!!!!!!!!\n",j,i);
int* one_collection = collection(c, j, 4400);
// MODIFY THE DYMANIC ALLOCATION OF SPACES (SIZE_OF_COMBINATION) IN combNonRec() function.
if (get_combination_size(SIZE_OF_COLLECTION, M) >= 4) {
//GET THE 2D-ARRAY OF COMBINATION
int** rrr = combNonRec(one_collection, SIZE_OF_COLLECTION, M);
#pragma omp parallel for firstprivate(k) shared(i,j,Big_Block,NUM_OF_BLOCK,SIZE_OF_COLLECTION,b)
for (k = 0; k < get_combination_size(SIZE_OF_COLLECTION, M); k++) {
create_Block(rrr[k], i); //ACTUAL CREATION OF BLOCK !!!!!!!
printf("This is block %d \n", NUM_OF_BLOCK);
add_To_Block_Collection();
}
free(rrr);
}
free(one_collection);
}
//OpenMP for j
free(c);
}
// OpenMP for i
collision();
}
而串行结果有400个常量块。
Big_Block,NUM_OF_BLOCK,SIZE_OF_COLLECTION是全局变量。
我是否在衍生声明中做错了什么?什么可能导致这样的问题?