我使用openMP在C ++中运行一个非常简单的例程并测量已用时间...代码在读取时进行,
#include <iostream>
#include <math.h>
#include "timer.h"
#include <omp.h>
int main ()
{
double start,finish;
int i;
int n=8000;
double a[n];
double b[n];
double c[n];
GET_TIME(start);
#pragma omp parallel private(i,a) shared(b,c,n)
{
#pragma omp for
for (i=0; i<n-1; i++)
b[i] += (a[i] + a[i+1])/2;
#pragma omp for
for (i=0; i<n-1; i++)
c[i] += (a[i] + a[i+1])/2;
}
GET_TIME(finish);
std::cout<< "Elapsed time is" <<(finish-start)<<"seconds";
return 0;
}
使用以下bash脚本进行编译的代码(观察线程是在环境变量OMP_NUM_THREADS = $ n中定义的):
#!/bin/bash
clear
g++ -O3 -o test test.cpp -fopenmp
for n in $(seq 1 8); do
export OMP_NUM_THREADS=$n
./test
echo threads=$n
done
因此,观察到随着线程数增加而降低性能的一般趋势如下:(当然数字可以改变)......
Elapsed time is0.000161886secondsthreads=1
Elapsed time is0.00019002secondsthreads=2
Elapsed time is0.00226498secondsthreads=3
Elapsed time is0.000210047secondsthreads=4
Elapsed time is0.000212908secondsthreads=5
Elapsed time is0.00920105secondsthreads=6
Elapsed time is0.00937104secondsthreads=7
Elapsed time is0.000834942secondsthreads=8
任何提高性能的建议(而不是减少它)? 非常感谢!。
答案 0 :(得分:1)
你可以这样做,它会增加每个线程的操作。这是为了通过实际让线程做更多工作来克服启动新线程所需的开销。此外,无需将b,c或n声明为共享。
#pragma omp parallel private(i,a,b,c,n)
{
#pragma omp for schedule(static)
for (i=0; i<n-1; i++){
b[i] += (a[i] + a[i+1])/2;
c[i] += (a[i] + a[i+1])/2;}
}