Question

我有以下递归程序，我想使用OpenMP进行并行化：

#include <iostream>
#include <cmath>
#include <numeric>
#include <vector>
#include <algorithm>
#include <thread>
#include <omp.h>


// Determines if a point of dimension point.size() is within the sphere
bool isPointWithinSphere(std::vector<int> point, const double &radius) {

    // Since we know that the sphere is centered at the origin, we can simply
    // find the euclidean distance (square root of the sum of squares) and check to
    // see if it is less than or equal to the length of the radius 

    //square each element inside the point vector
    std::transform(point.begin(), point.end(), point.begin(), [](auto &x){return std::pow(x,2);});

    //find the square root of the sum of squares and check if it is less than or equal to the radius
    return std::sqrt(std::accumulate(point.begin(), point.end(), 0, std::plus<int>())) <= radius;    
}

// Counts the number of lattice points inside the sphere( all points (x1 .... xn) such that xi is an integer )

// The algorithm: If the radius is a floating point value, first find the floor of the radius and cast it to 
// an integer. For example, if the radius is 2.43 then the only integer points we must check are those between
// -2 and 2. We generate these points by simulating n - nested loops using recursion and passing each point
// in to the boolean function isPointWithinSphere(...), if the function returns true, we add one to the count
// (we have found a lattice point on the sphere). 

int countLatticePoints(std::vector<int> point, const double radius, const int dimension, int count = 0) {

    const int R = static_cast<int>(std::floor(radius));

    #pragma omp parallel for
    for(int i = -R; i <= R; i++) {
        point.push_back(i);

        if(point.size() == dimension){
            if(isPointWithinSphere(point, radius)) count++;
        }else count = countLatticePoints(point, radius, dimension, count);

        point.pop_back();

    }

    return count;
}

int main(int argc, char ** argv) {
    std::vector<int> vec;

    #pragma omp parallel
    std::cout << countLatticePoints(vec, 5, 7) << std::endl;   

    return 0;
}

我尝试在main函数中添加并行区域以及在countLatticePoints内并行化for循环但是我看到并行化和顺序运行算法几乎没有任何改进。任何帮助/建议都将受到我可以使用的其他OpenMP策略的赞赏。

Answer 1

我会采取建议路线。在尝试使用线程加快程序运行之前，首先要在单线程情况下使其更快。您可以进行一些改进。你正在制作大量的点矢量副本，这会导致大量昂贵的内存分配。

将point传递给isPointWithinSphere作为参考。然后，使用一个循环来平方并在point中累积每个元素，而不是两个循环。然后，在检查半径时，比较距离的平方而不是距离。这样可以避免sqrt调用，并将其替换为简单的正方形。

countLatticePoints也应该引用point。而不是调用point.size()，每次递归时从dimension减1，然后只检查dimension == 1而不是计算大小。

尽管如此，如果你仍然需要/需要引入线程，你需要做一些调整，因为通过引用传递点。 countLatticePoint需要有两个变体，其中包含OpenMP指令的初始调用，以及没有它们的递归调用。

#pragma omp parallel中的main不会做任何事情，因为只有一个代码块可以执行。

在C ++中使用OpenMP并行化递归函数

1 个答案: