Question

我一直在尝试将这段代码并行化大约两天并且一直存在逻辑错误。程序是使用非常小的dx的总和找到积分的面积并计算积分的每个离散值。我试图用openmp实现这个，但我实际上没有使用openmp的经验。我想请你的帮助。实际目标是在线程中并行化suma变量，以便每个线程计算更少的积分值。程序编译成功，但是当我执行程序时，它会返回错误的结果。

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(int argc, char *argv[]){
    float down = 1, up = 100, dx, suma = 0, j;
    int steps, i, nthreads, tid;
    long starttime, finishtime, runtime; 

    starttime = omp_get_wtime();
    steps = atoi(argv[1]);
    dx = (up - down) / steps;

    nthreads = omp_get_num_threads();
    tid = omp_get_thread_num();
    #pragma omp parallel for private(i, j, tid) reduction(+:suma)
    for(i = 0; i < steps; i++){
        for(j = (steps / nthreads) * tid; j < (steps / nthreads) * (tid + 1); j += dx){
            suma += ((j * j * j) + ((j + dx) * (j + dx) * (j + dx))) / 2 * dx;
        }
    }
    printf("For %d steps the area of the integral  3 * x^2 + 1 from %f to %f is: %f\n", steps, down, up, suma);
    finishtime = omp_get_wtime();
    runtime = finishtime - starttime;
    printf("Runtime: %ld\n", runtime);
    return (0);
}

Answer 1

问题出在你的for循环中。如果您使用for-pragma，OpenMP会为您执行循环拆分：

#pragma omp parallel for private(i) reduction(+:suma)
for(i = 0; i < steps; i++) {
    // recover the x-position of the i-th step
    float x = down + i * dx;
    // evaluate the function at x
    float y = (3.0f * x * x + 1)
    // add the sum of the rectangle to the overall integral
    suma += y * dx
}

即使您将转换为并行化方案，您必须自己计算索引，这将是有问题的。外循环应该只执行nthread次。

您还应该考虑切换到双倍以提高准确度。

Answer 2

我们只考虑threads = 1的情况。这样：

#pragma omp parallel for private(i, j, tid) reduction(+:suma)
for(i = 0; i < steps; i++){
    for(j = (steps / nthreads) * tid; j < (steps / nthreads) * (tid + 1); j += dx){
        suma += ((j * j * j) + ((j + dx) * (j + dx) * (j + dx))) / 2 * dx;
    }
}

变成了这个：

for(i = 0; i < steps; i++){
    for(j = 0; j < steps; j += dx){
        suma += ((j * j * j) + ((j + dx) * (j + dx) * (j + dx))) / 2 * dx;
    }
}

你可以开始看到问题;你基本上是在²的步骤上循环。

此外，你的第二个循环没有任何意义，因为你正在增加dx。指标（i，j）与物理域中的位置（i * dx）之间的相同混淆显示在您的增量中。 j+dx没有任何意义。大概你想要增加suma乘以（f（x）+ f（x'））* dx / 2（例如，梯形规则）;应该是

        float x = down + i*dx;
        suma += dx * ((3 * x * x + 1) + (3 * (x + dx) * (x + dx) + 1)) / 2;

正如ebo指出的那样，你想要总结 integrand ，而不是它的反衍生物。

现在，如果我们对答案进行检查：

printf("For %d steps the area of the integral  3 * x^2 + 1 from %f to %f is: %f (expected: %f)\n",
            steps, down, up, suma, up*up*up-down*down*down + up - down);

我们连续运行它，我们开始得到正确答案：

$ ./foo 10
For 10 steps the area of the integral  3 * x^2 + 1 from 1.000000 to 100.000000 is: 1004949.375000 (expected: 1000098.000000)
Runtime: 0
$ ./foo 100
For 100 steps the area of the integral  3 * x^2 + 1 from 1.000000 to 100.000000 is: 1000146.562500 (expected: 1000098.000000)
Runtime: 0
$ ./foo 1000
For 1000 steps the area of the integral  3 * x^2 + 1 from 1.000000 to 100.000000 is: 1000098.437500 (expected: 1000098.000000)
Runtime: 0

在串行案例有效之前，担心OpenMP案件毫无意义。

一旦到了OpenMP，就像ebo指出的那样，最简单的方法就是让OpenMP为你做循环分解：例如，

#pragma omp parallel for reduction(+:suma)
    for(i = 0; i < steps; i++){
        float x = down + i*dx;
        suma += dx * ((3 * x * x + 1) + (3 * (x + dx) * (x + dx) + 1)) / 2;
    }

运行它，一个得到

$ setenv OMP_NUM_THREADS 1
$ ./foo 1000
For 1000 steps the area of the integral  3 * x^2 + 1 from 1.000000 to 100.000000 is: 1000098.437500 (expected: 1000098.000000)
Runtime: 0
$ setenv OMP_NUM_THREADS 2
$ ./foo 1000
For 1000 steps the area of the integral  3 * x^2 + 1 from 1.000000 to 100.000000 is: 1000098.437500 (expected: 1000098.000000)
Runtime: 0
$ setenv OMP_NUM_THREADS 4
$ ./foo 1000
For 1000 steps the area of the integral  3 * x^2 + 1 from 1.000000 to 100.000000 is: 1000098.625000 (expected: 1000098.000000)
Runtime: 0
$ setenv OMP_NUM_THREADS 8
$ ./foo 1000
For 1000 steps the area of the integral  3 * x^2 + 1 from 1.000000 to 100.000000 is: 1000098.500000 (expected: 1000098.000000)

如果您真的愿意，可以在OpenMP中明确地执行阻止，但您应该有理由这样做。

使用C与OpenMP求和

2 个答案: