Question

我正在尝试学习OpenMP来并行化我的部分代码，并且我试图弄清楚为什么在使用2个线程而不是1时它不会更快。这里＆＃39; sa代码的最小工作示例：

#include <iostream>
#include <omp.h>

using namespace std;

class My_class
{
    public :

        // Constructor
        My_class(int nuIterations) 
            : prVar_(0),
              nuIters_(nuIterations)
        {} // Empty

        // Do something expensive involving the class' private vars
        void do_calculations()
        {
            for (int i=0;i<nuIters_;++i){
                prVar_=prVar_+i+2*i+3*i+4*i-5*i-4*i;
            }
        }

        // Retrieve result
        double getResult()
        {
            return prVar_;
        }

    private:

        double prVar_;
        int nuIters_;

};

int main()
{
    // Initialize one object for every thread
    My_class *test_object1, *test_object2;
    test_object1 = new My_class(1000000000);
    test_object2 = new My_class(500000000);

    // Set number of threads (use one line at a time)
    omp_set_num_threads(1); // One thread executes in 11.5 real seconds
    //omp_set_num_threads(2); // Two threads execute in 13.2 real seconds
    double start = omp_get_wtime(); // Start timer
#pragma omp parallel sections // Do calculations in parallel
    {
#pragma omp section
        {
            test_object1->do_calculations();
        }
#pragma omp section
        {
            test_object2->do_calculations();
        }
    }// End of parallel sections
    // Print results
    double end = omp_get_wtime();
    cout<<"Res 1 : "<<test_object1->getResult()<<endl;
    cout<<"Res 2 : "<<test_object2->getResult()<<endl;
    cout<<"Time  : "<<end-start<<endl;

    return 0;
}

使用g++ myomp.cpp -O0 -std=c++11 -fopenmp编译并运行它会为1和2个线程提供以下执行时间：

1个帖子：11.5秒
2个主题：13.2秒

有什么方法可以加快2个线程的速度吗？我在4核Intel i7-4600U和Ubuntu上运行它。

编辑：改变了大部分帖子，使其遵循指导原则。

Answer 1

这里有两种效果：

缓存行争用：您在动态内存中分配了两个非常小的对象。如果它们最终位于同一缓存行（通常为64字节），则想要更新prVar_的线程将竞争1级缓存，因为它们需要独占（写入）访问。你应该随机观察一下：有时它会明显更快/更慢，具体取决于内存位置。尝试打印指针地址并将它们除以64.要解决此问题，您需要pad / align the memory。
您的负载不平衡很大。一项任务就是计算两倍的工作量，所以即使在理想条件下，你也只能达到1.5的加速。

在非平凡的计算中没有使用OpenMP获得预期的加速

1 个答案: