Question

我想稍微了解OpenMP，因为我想要一个巨大的循环并行化。经过一些阅读（SO，Common OMP mistakes，tutorial等）后，我将第一步作为下面给出的基本工作的c / mex代码（这会产生不同的第一步）测试用例）。

第一个测试总结了结果值 - 函数serial, parallel - ，
第二个从输入数组中获取值并将处理后的值写入输出数组 - 函数serial_a, parallel_a。

我的问题是：

为什么第一次测试的结果不同，i。即serial和parallel
令人惊讶的是，第二次测试成功了。我关心的是，如何处理可能由多个线程读取的内存（数组位置）？在示例中，这应该由a[i])/cos(a[n-i]模拟。
是否有一些简单的规则如何确定哪些变量要声明为私有，共享和减少？
在这两种情况下，int i都在pragma之外，但第二次测试似乎会产生正确的结果。可以，或者i被移到pragma omp parallel区域，as being said here？
关于发现错误的其他任何暗示？

代码

#include "mex.h"
#include <math.h>
#include <omp.h>
#include <time.h>

double serial(int x)
{
    double sum=0;
    int i;

    for(i = 0; i<x; i++){
        sum += sin(x*i) / cos(x*i+1.0);
    }
    return sum;
}

double parallel(int x)
{
    double sum=0;
    int i;

    #pragma omp parallel num_threads(6) shared(sum) //default(none) 
    {
        //printf("    I'm thread no. %d\n", omp_get_thread_num());

        #pragma omp for private(i, x) reduction(+: sum)
        for(i = 0; i<x; i++){
            sum += sin(x*i) / cos(x*i+1.0);
        }
    }
    return sum;
}

void serial_a(double* a, int n, double* y2)
{
    int i;

    for(i = 0; i<n; i++){
         y2[i] = sin(a[i]) / cos(a[n-i]+1.0);
    }
}

void parallel_a(double* a, int n, double* y2)
{
    int i;

    #pragma omp parallel num_threads(6)
    {       
        #pragma omp for private(i)
        for(i = 0; i<n; i++){
            y2[i] = sin(a[i]) / cos(a[n-i]+1.0);
        }
    }
}

void mexFunction(int nlhs, mxArray* plhs[], int nrhs, const mxArray* prhs[])
{
    double sum, *y1, *y2, *a, s, p;
    int x, n, *d;

    /* Check for proper number of arguments. */
    if(nrhs!=2) {
        mexErrMsgTxt("Two inputs required.");
    } else if(nlhs>2) {
        mexErrMsgTxt("Too many output arguments.");
    }
    /* Get pointer to first input */
    x = (int)mxGetScalar(prhs[0]);

    /* Get pointer to second input */
    a = mxGetPr(prhs[1]);
    d = (int*)mxGetDimensions(prhs[1]);
    n = (int)d[1]; // row vector

    /* Create space for output */
    plhs[0] = mxCreateDoubleMatrix(2,1, mxREAL);
    plhs[1] = mxCreateDoubleMatrix(n,2, mxREAL);

    /* Get pointer to output array */
    y1 = mxGetPr(plhs[0]);
    y2 = mxGetPr(plhs[1]);

    {   /* Do the calculation */
        clock_t tic = clock();
        y1[0] = serial(x);
        s = (double) clock()-tic;
        printf("serial....: %.0f ms\n", s);
        mexEvalString("drawnow");

        tic = clock();
        y1[1] = parallel(x);
        p = (double) clock()-tic;
        printf("parallel..: %.0f ms\n", p);
        printf("ratio.....: %.2f \n", p/s);
        mexEvalString("drawnow");

        tic = clock();
        serial_a(a, n, y2);
        s = (double) clock()-tic;
        printf("serial_a..: %.0f ms\n", s);
        mexEvalString("drawnow");

        tic = clock();
        parallel_a(a, n, &y2[n]);
        p = (double) clock()-tic;
        printf("parallel_a: %.0f ms\n", p);
        printf("ratio.....: %.2f \n", p/s); 
    }
}

输出

>> mex omp1.c
>> [a, b] = omp1(1e8, 1:1e8);
serial....: 13399 ms
parallel..: 2810 ms
ratio.....: 0.21 
serial_a..: 12840 ms
parallel_a: 2740 ms
ratio.....: 0.21 
>> a(1) == a(2)

ans =

     0

>> all(b(:,1) == b(:,2))

ans =

     1

系统

MATLAB Version: 8.0.0.783 (R2012b)
Operating System: Microsoft Windows 7 Version 6.1 (Build 7601: Service Pack 1)
Microsoft Visual Studio 2005 Version 8.0.50727.867

Answer 1

在你的函数parallel中，你有一些错误。使用parallel时应声明减少量。使用parallel时，还应声明私有和共享变量。但是当你进行减少时，你不应该将正在减少的变量声明为共享。减少将照顾这一点。

要了解声明私有或共享的内容，您必须问自己要写入哪些变量。如果没有写入变量，那么通常你希望它被共享。在您的情况下，变量x不会更改，因此您应该将其声明为共享。但是，变量i确实会发生变化，所以你应该将其声明为私有，以便修复你可以做的功能

#pragma omp parallel reduction(+:sum) private(i) shared(x)
{
    #pragma omp for 
    for(i = 0; i<x; i++){
        sum += sin(x*i) / cos(x*i+1.0);
    }
}

但是，OpenMP会自动为区域私有设置并行迭代器，并且默认情况下共享在并行区域外声明的变量，因此对于并行函数，您只需执行

#pragma omp parallel for reduction(+:sum)
for(i = 0; i<x; i++){
    sum += sin(x*i) / cos(x*i+1.0);
}

请注意，此序列号和序列号之间的唯一区别是pragma语句。 OpenMP旨在让您不必更改代码，除了编译指示声明。

对于数组，只要并行for循环的每次迭代都作用于不同的数组元素，那么您就不必担心共享和私有。因此，您可以将private_a函数简单地写为

#pragma omp parallel for
for(i = 0; i<n; i++){
    y2[i] = sin(a[i]) / cos(a[n-i]+1.0);
}

除了pragma语句之外，它再次与serial_a函数相同。

但请注意假设迭代器是私有的。考虑以下双循环

for(i=0; i<n; i++) {
    for(j=0; j<m; j++) {
       //
    }
}

如果使用#pragma parallel for，i迭代器将被设为私有，但j迭代器将被共享。这是因为parallel for仅适用于i之外的外部循环，并且由于j默认共享，因此不会将其设为私有。在这种情况下，您需要明确声明j私有，如#pragma parallel for private(j)。

从多个线程读取数组时要注意什么？

1 个答案: