我遇到了以下矩阵乘法程序的并行化问题。优化版本比顺序版本更慢或更快。我已经把这个错误搞砸了,但是找不到它......我在其他机器上也测试了它,但得到了相同的...
感谢您的帮助
主:
int main(int argc, char** argv){
if((matrixA).size != (matrixB).size){
fprintf(ResultFile,"\tError for %s and %s - Matrix A and B are not of the same size ...\n", argv[1], argv[2]);
}
else{
allocateResultMatrix(&resultMatrix, matrixA.size, 0);
if(*argv[5] == '1'){ /* Sequentielle Ausfuehrung */
begin = clock();
matrixMultSeq(&matrixA, &matrixB, &resultMatrix);
end = clock();
};
if(*argv[5] == '2'){ /* Ausfuehrung mit OpenMP */
printf("Max number of threads: %i \n",omp_get_max_threads());
begin = clock();
matrixMultOmp(&matrixA, &matrixB, &resultMatrix);
end = clock();
};
if(*argv[5] == '3'){ /* Ausführung mittels PThreads */
pthread_t threads[NUMTHREADS];
pthread_attr_t attr;
int i;
struct parameter arg[NUMTHREADS];
pthread_attr_init(&attr); /* Attribut initialisieren */
begin = clock();
for(i=0; i<NUMTHREADS; i++){ /* Initialisierung der einzelnen Threads */
arg[i].id = i;
arg[i].num_threads = NUMTHREADS;
arg[i].dimension = matrixA.size;
arg[i].matrixA = &matrixA;
arg[i].matrixB = &matrixB;
arg[i].resultMatrix = &resultMatrix;
pthread_create(&threads[i], &attr, worker, (void *)(&arg[i]));
}
pthread_attr_destroy(&attr);
for(i=0; i<NUMTHREADS; i++){ /* Warten auf Rückkehr der Threads */
pthread_join(threads[i], NULL);
}
end = clock();
}
t=end - begin;
t/=CLOCKS_PER_SEC;
if(*argv[5] == '1')
fprintf(ResultFile, "\tTime for sequential multiplication: %0.10f seconds\n\n", t);
if(*argv[5] == '2')
fprintf(ResultFile, "\tTime for OpenMP multiplication: %0.10f seconds\n\n", t);
if(*argv[5] == '3')
fprintf(ResultFile, "\tTime for PThread multiplication: %0.10f seconds\n\n", t);
}
}
}
void matrixMultOmp(struct matrix * matrixA, struct matrix * matrixB, struct matrix * resultMatrix){
int i, j, k, l;
double sum = 0;
l = (*matrixA).size;
#pragma omp parallel for private(j,k) firstprivate (sum)
for(i=0; i<=l; i++){
for(j=0; j<=l; j++){
sum = 0;
for(k=0; k<=l; k++){
sum = sum + (*matrixA).matrixPointer[i][k]*(*matrixB).matrixPointer[k][j];
}
(*resultMatrix).matrixPointer[i][j] = sum;
}
}
}
void mm(int thread_id, int numthreads, int dimension, struct matrix* a, struct matrix* b, struct matrix* c){
int i,j,k;
double sum;
i = thread_id;
while (i <= dimension) {
for (j = 0; j <= dimension; j++) {
sum = 0;
for (k = 0; k <= dimension; k++) {
sum = sum + (*a).matrixPointer[i][k] * (*b).matrixPointer[k][j];
}
(*c).matrixPointer[i][j] = sum;
}
i+=numthreads;
}
}
void * worker(void * arg){
struct parameter * p = (struct parameter *) arg;
mm((*p).id, (*p).numthreads, (*p).dimension, (*p).matrixA, (*p).matrixB, (*p).resultMatrix);
pthread_exit((void *) 0);
}
这是带时间的输出: 开始计算resultMatrix for matrices / SimpleMatrixA.txt和matrices / SimpleMatrixB.txt ... 矩阵A的大小:6个元素 matrixB的大小:6个元素 连续乘法的时间:0.0000030000秒
Starting calculating resultMatrix for matrices/SimpleMatrixA.txt and matrices/SimpleMatrixB.txt ...
Size of matrixA: 6 elements
Size of matrixB: 6 elements
Time for OpenMP multiplication: 0.0002440000 seconds
Starting calculating resultMatrix for matrices/SimpleMatrixA.txt and matrices/SimpleMatrixB.txt ...
Size of matrixA: 6 elements
Size of matrixB: 6 elements
Time for PThread multiplication: 0.0006680000 seconds
Starting calculating resultMatrix for matrices/ShortMatrixA.txt and matrices/ShortMatrixB.txt ...
Size of matrixA: 100 elements
Size of matrixB: 100 elements
Time for sequential multiplication: 0.0075190002 seconds
Starting calculating resultMatrix for matrices/ShortMatrixA.txt and matrices/ShortMatrixB.txt ...
Size of matrixA: 100 elements
Size of matrixB: 100 elements
Time for OpenMP multiplication: 0.0076710000 seconds
Starting calculating resultMatrix for matrices/ShortMatrixA.txt and matrices/ShortMatrixB.txt ...
Size of matrixA: 100 elements
Size of matrixB: 100 elements
Time for PThread multiplication: 0.0068080002 seconds
Starting calculating resultMatrix for matrices/LargeMatrixA.txt and matrices/LargeMatrixB.txt ...
Size of matrixA: 1000 elements
Size of matrixB: 1000 elements
Time for sequential multiplication: 9.6421155930 seconds
Starting calculating resultMatrix for matrices/LargeMatrixA.txt and matrices/LargeMatrixB.txt ...
Size of matrixA: 1000 elements
Size of matrixB: 1000 elements
Time for OpenMP multiplication: 10.5361270905 seconds
Starting calculating resultMatrix for matrices/LargeMatrixA.txt and matrices/LargeMatrixB.txt ...
Size of matrixA: 1000 elements
Size of matrixB: 1000 elements
Time for PThread multiplication: 9.8952226639 seconds
Starting calculating resultMatrix for matrices/HugeMatrixA.txt and matrices/HugeMatrixB.txt ...
Size of matrixA: 5000 elements
Size of matrixB: 5000 elements
Time for sequential multiplication: 1981.1383056641 seconds
Starting calculating resultMatrix for matrices/HugeMatrixA.txt and matrices/HugeMatrixB.txt ...
Size of matrixA: 5000 elements
Size of matrixB: 5000 elements
Time for OpenMP multiplication: 2137.8527832031 seconds
Starting calculating resultMatrix for matrices/HugeMatrixA.txt and matrices/HugeMatrixB.txt ...
Size of matrixA: 5000 elements
Size of matrixB: 5000 elements
Time for PThread multiplication: 1977.5153808594 seconds
答案 0 :(得分:2)
正如评论中已经提到的,您的第一个主要问题是使用clock()
。它返回程序执行的处理器时间。您正在寻找的是程序执行的 wall 时间。在顺序代码中,这些是相同的,但有多个核心并不完全正确。幸运的是,OpenMP已经涵盖了你:改为使用函数omp_get_wtime()
。
最后,您需要更大的矩阵才能看到多线程带来的好处。如果创建/管理线程的开销比线程正在处理的实际工作更昂贵,那么您将永远不会看到并行性带来的任何好处。因此,对6x6矩阵乘法进行计时毫无意义。我会从1000x1000开始,至少检查2000x2000和8000x8000。