Question

我正在编写一个光谱PDE代码，我想要并行化，使用FFTW进行FFT。代码中的主循环如下所示。假设我有一个真正的空间数组，你，还有一个更强大的空间数组 uhat ，我已经构建了在这个for循环之外的它们之间的计划（并且在任何并行区域之外），以及为fftw_execute的并行化调用所需的FFTW初始化函数。

for
{
    fftw_execute(u_to_uhat)

    // do some things with uhat, the fourier space array - for 
    // example,scale the transform in a for loop

    for(int n = 0, n<Nmax,n++)
    {
        uhat[n] = scale*uhat[n])
    }

    // transform back

    fftw_execute(uhat_to_u)
}

我想在这里并行化所有内容，包括在第二个for循环中对uhat的操作。 我的问题是我应该如何使用openMP #pragmas这样做？目前我在内环周围有一个平行区域：

fftw_init_threads();
fftw_plan_with_nthreads(omp_get_max_threads());
for
{
    fftw_execute(u_to_uhat)

    // do some things with uhat, the fourier space array - for 
    // example,scale the transform in a for loop
    #pragma omp parallel for
    for(int n = 0, n<Nmax,n++)
    {
        uhat[n] = scale*uhat[n])
    }

    // transform back

    fftw_execute(uhat_to_u)

}

但据我所知，每次进入for循环时，这会创建并销毁一个线程块，这很昂贵。我宁愿在for循环之外构造一次并行区域：

fftw_init_threads();
fftw_plan_with_nthreads(omp_get_max_threads());
#pragma omp parallel 
for
{
    fftw_execute(u_to_uhat)

    // do some things with uhat, the fourier space array - for 
    // example,scale the transform in a for loop
    #pragma omp for 
    for(int n = 0, n<Nmax,n++)
    {
        uhat[n] = scale*uhat[n])
    }

    // transform back

    fftw_execute(uhat_to_u)
}

但是我不得不担心在并行区域内调用fftw_execute的行为。我认为文档第5.4节说fftw_execute是线程安全的（即从并行区域内调用是安全的）。但它没有告诉我fftw_execute在调用时是否创建了自己的线程块 - 即通过将其置于并行区域中，我只是让每个现有线程在fftw_execute函数中构造一个更多线程的负载？或者，它是否知道使用已存在的线程？

简而言之我是否可以从并行区域内调用fftw_execute（），并且可以让它按照希望的方式工作 - 即只使用现有线程来完成工作，而不是产生新的。

对不起，对于这个很长的问题，我真的很感激你的一些建议！

在OpenMP并行区域内调用时fftw_execute的行为

0 个答案: