我一直在尝试将这段代码并行化大约两天并且一直存在逻辑错误。程序是使用非常小的dx的总和找到积分的面积并计算积分的每个离散值。我试图用openmp实现这个,但我实际上没有使用openmp的经验。我想请你的帮助。实际目标是在线程中并行化suma变量,以便每个线程计算更少的积分值。程序编译成功,但是当我执行程序时,它会返回错误的结果。
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(int argc, char *argv[]){
float down = 1, up = 100, dx, suma = 0, j;
int steps, i, nthreads, tid;
long starttime, finishtime, runtime;
starttime = omp_get_wtime();
steps = atoi(argv[1]);
dx = (up - down) / steps;
nthreads = omp_get_num_threads();
tid = omp_get_thread_num();
#pragma omp parallel for private(i, j, tid) reduction(+:suma)
for(i = 0; i < steps; i++){
for(j = (steps / nthreads) * tid; j < (steps / nthreads) * (tid + 1); j += dx){
suma += ((j * j * j) + ((j + dx) * (j + dx) * (j + dx))) / 2 * dx;
}
}
printf("For %d steps the area of the integral 3 * x^2 + 1 from %f to %f is: %f\n", steps, down, up, suma);
finishtime = omp_get_wtime();
runtime = finishtime - starttime;
printf("Runtime: %ld\n", runtime);
return (0);
}
答案 0 :(得分:3)
问题出在你的for循环中。如果您使用for-pragma,OpenMP会为您执行循环拆分:
#pragma omp parallel for private(i) reduction(+:suma)
for(i = 0; i < steps; i++) {
// recover the x-position of the i-th step
float x = down + i * dx;
// evaluate the function at x
float y = (3.0f * x * x + 1)
// add the sum of the rectangle to the overall integral
suma += y * dx
}
即使您将转换为并行化方案,您必须自己计算索引,这将是有问题的。外循环应该只执行nthread次。
您还应该考虑切换到双倍以提高准确度。
答案 1 :(得分:0)
我们只考虑threads = 1的情况。这样:
#pragma omp parallel for private(i, j, tid) reduction(+:suma)
for(i = 0; i < steps; i++){
for(j = (steps / nthreads) * tid; j < (steps / nthreads) * (tid + 1); j += dx){
suma += ((j * j * j) + ((j + dx) * (j + dx) * (j + dx))) / 2 * dx;
}
}
变成了这个:
for(i = 0; i < steps; i++){
for(j = 0; j < steps; j += dx){
suma += ((j * j * j) + ((j + dx) * (j + dx) * (j + dx))) / 2 * dx;
}
}
你可以开始看到问题;你基本上是在 2 的步骤上循环。
此外,你的第二个循环没有任何意义,因为你正在增加dx。指标(i,j)与物理域中的位置(i * dx)之间的相同混淆显示在您的增量中。 j+dx
没有任何意义。大概你想要增加suma
乘以(f(x)+ f(x'))* dx / 2(例如,梯形规则);应该是
float x = down + i*dx;
suma += dx * ((3 * x * x + 1) + (3 * (x + dx) * (x + dx) + 1)) / 2;
正如ebo指出的那样,你想要总结 integrand ,而不是它的反衍生物。
现在,如果我们对答案进行检查:
printf("For %d steps the area of the integral 3 * x^2 + 1 from %f to %f is: %f (expected: %f)\n",
steps, down, up, suma, up*up*up-down*down*down + up - down);
我们连续运行它,我们开始得到正确答案:
$ ./foo 10
For 10 steps the area of the integral 3 * x^2 + 1 from 1.000000 to 100.000000 is: 1004949.375000 (expected: 1000098.000000)
Runtime: 0
$ ./foo 100
For 100 steps the area of the integral 3 * x^2 + 1 from 1.000000 to 100.000000 is: 1000146.562500 (expected: 1000098.000000)
Runtime: 0
$ ./foo 1000
For 1000 steps the area of the integral 3 * x^2 + 1 from 1.000000 to 100.000000 is: 1000098.437500 (expected: 1000098.000000)
Runtime: 0
在串行案例有效之前,担心OpenMP案件毫无意义。
一旦到了OpenMP,就像ebo指出的那样,最简单的方法就是让OpenMP为你做循环分解:例如,
#pragma omp parallel for reduction(+:suma)
for(i = 0; i < steps; i++){
float x = down + i*dx;
suma += dx * ((3 * x * x + 1) + (3 * (x + dx) * (x + dx) + 1)) / 2;
}
运行它,一个得到
$ setenv OMP_NUM_THREADS 1
$ ./foo 1000
For 1000 steps the area of the integral 3 * x^2 + 1 from 1.000000 to 100.000000 is: 1000098.437500 (expected: 1000098.000000)
Runtime: 0
$ setenv OMP_NUM_THREADS 2
$ ./foo 1000
For 1000 steps the area of the integral 3 * x^2 + 1 from 1.000000 to 100.000000 is: 1000098.437500 (expected: 1000098.000000)
Runtime: 0
$ setenv OMP_NUM_THREADS 4
$ ./foo 1000
For 1000 steps the area of the integral 3 * x^2 + 1 from 1.000000 to 100.000000 is: 1000098.625000 (expected: 1000098.000000)
Runtime: 0
$ setenv OMP_NUM_THREADS 8
$ ./foo 1000
For 1000 steps the area of the integral 3 * x^2 + 1 from 1.000000 to 100.000000 is: 1000098.500000 (expected: 1000098.000000)
如果您真的愿意,可以在OpenMP中明确地执行阻止,但您应该有理由这样做。