使用自定义类时,多线程通用矩阵加法非常慢

时间:2018-10-22 19:24:45

标签: c++ multithreading performance

我使用多线程和顺序算法实现了两个通用矩阵的加法。我用两个包含实数(双精度)的大矩阵(2000x2000)测试了我的程序,结果非常好。操作成功完成得非常快。后来,我实现了一个表示复数的类,并尝试用两个矩阵重复相同的场景,我发现即使对于两个50x50的矩阵,完成整个过程也需要一定的时间。为了延长执行时间我该怎么办?

这是创建线程的方法(首先,我创建了两个一维数组,以便更轻松地为每个线程提供其起点和终点):

template<typename T, typename Func>
Matrix<T> *calculateLinearDistribution(Matrix<T> *matrix1,
                                       Matrix<T> *matrix2,
                                       Func operation,
                                       int nThreads) {
    const int n = matrix1->getN(), m = matrix2->getM(), totalNumbers = n * m;
    Matrix<T> *result = new Matrix<T>(n, m);
    T *matrix1Unidim = new T[totalNumbers];
    T *matrix2Unidim = new T[totalNumbers];
    convertMatrixToUnidimensionalArray(matrix1, matrix1Unidim);
    convertMatrixToUnidimensionalArray(matrix1, matrix2Unidim);
    if (totalNumbers < nThreads) {
        nThreads = totalNumbers;
    }
    const int quantityPerThread = totalNumbers / nThreads;
    int rest = totalNumbers % nThreads;
    int start = 0, end = 0;
    std::vector<std::thread> threads;
    std::chrono::milliseconds startTime = std::chrono::duration_cast<std::chrono::milliseconds>(
            std::chrono::system_clock::now().time_since_epoch());
    for (int i = 0; i < nThreads; i++) {
        end += quantityPerThread;
        if (rest > 0) {
            end++;
            rest--;
        }
        threads.push_back(std::thread(MultithreadedMethods<T, Func>::linearElementsDistribution, &matrix1Unidim[0],
                                      &matrix2Unidim[0], result, start, end, operation));
        start = end;
    }
    for (int i = 0; i < nThreads; i++) {
        threads[i].join();
    }
    std::chrono::milliseconds endTime = std::chrono::duration_cast<std::chrono::milliseconds>(
            std::chrono::system_clock::now().time_since_epoch());
    std::ofstream out(linearElemensStatisticsFile, std::ios_base::app);
    std::chrono::milliseconds time = endTime - startTime;
    out << "Dimensiune matrice: " << matrix1->getN() << "x" << matrix1->getM()
        << " | Nr. threads: " << nThreads << " | Timp de executie: " << time.count() << std::endl;
    out.close();
    delete[] matrix1Unidim;
    delete[] matrix2Unidim;
    return result;
}

这是提供给线程的函数:

template<typename T, typename Func>
void MultithreadedMethods<T, Func>::linearElementsDistribution(T *matrix1,
                                                               T *matrix2,
                                                               Matrix<T> *result,
                                                               int start,
                                                               int end,
                                                               Func operation) {
    const int m = result->getM();
    for (int i = start; i < end; i++) {
        result->getElements()[i / m][i % m] = operation(matrix1[i], matrix2[i]);
    }
}

这是我使用实数运行过程的时间(非常快):

Matrix<double> *linearDistributionResult = calculateLinearDistribution(matrix1,
                                                                               matrix2,
                                                                               [](double a, double b) {
                                                                                   return a +
                                                                                          b;
                                                                               }, nThreads);

最后,这是我尝试使用复数的最糟糕的部分,与顺序结果相比,它要花很多时间甚至失败...

Matrix<ComplexNumber> *linearDistributionResult = calculateLinearDistribution(matrix1,
                                                                                          matrix2,
                                                                                          [](ComplexNumber a,
                                                                                             ComplexNumber b) {
                                                                                              return ComplexNumber(
                                                                                                      a.getRealComponent() +
                                                                                                      b.getRealComponent(),
                                                                                                      a.getImaginaryComponent() +
                                                                                                      b.getImaginaryComponent());
                                                                                          }, nThreads);

这当然是顺序实现(我想指出的是,与实数相比,当我使用复数时这也非常慢):

template<typename T, typename Func>
Matrix<T> *calculateSequentialResult(Matrix<T> *matrix1,
                                     Matrix<T> *matrix2,
                                     Func operation) {
    const int n = matrix1->getN(), m = matrix1->getM();
    Matrix<T> *result = new Matrix<T>(n, m);
    std::chrono::milliseconds startTime = std::chrono::duration_cast<std::chrono::milliseconds>(
            std::chrono::system_clock::now().time_since_epoch());
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < m; j++) {
            result->getElements()[i][j] = operation(matrix1->getElements()[i][j], matrix2->getElements()[i][j]);
        }
    }
    std::chrono::milliseconds endTime = std::chrono::duration_cast<std::chrono::milliseconds>(
            std::chrono::system_clock::now().time_since_epoch());
    std::ofstream out(sequentialElementsStatistics, std::ios_base::app);
    std::chrono::milliseconds time = endTime - startTime;
    out << "Dimensiune matrice: " << matrix1->getN() << "x" << matrix1->getM()
        << " | Nr. threads: 1 | Timp de executie: " << time.count() << std::endl;
    out.close();
    return result;
}

更新 这是使用“非常困倦”来分析执行时的结果: enter image description here

ComplexNumber类:

   class ComplexNumber {
    private:
        double realComponent;
        double imaginaryComponent;

    public:

        ComplexNumber() {}

        ComplexNumber(const ComplexNumber &complexNumber);

        double getRealComponent() const;

        ComplexNumber(double realComponent, double imaginaryComponent);

        void setRealComponent(double realComponent);

        double getImaginaryComponent() const;

        void setImaginaryComponent(double imaginaryComponent);

        friend std::ostream &operator<<(std::ostream &os, const ComplexNumber &complexNumber);
    };

and the definition:

double ComplexNumber::getRealComponent() const {
    return realComponent;
}

void ComplexNumber::setRealComponent(double realComponent) {
    ComplexNumber::realComponent = realComponent;
}

double ComplexNumber::getImaginaryComponent() const {
    return imaginaryComponent;
}

void ComplexNumber::setImaginaryComponent(double imaginaryComponent) {
    ComplexNumber::imaginaryComponent = imaginaryComponent;
}

ComplexNumber::ComplexNumber(double realComponent, double imaginaryComponent) : realComponent(realComponent),
                                                                                imaginaryComponent(imaginaryComponent) {

}

ComplexNumber::ComplexNumber(const ComplexNumber &complexNumber) {
    this->imaginaryComponent = complexNumber.imaginaryComponent;
    this->realComponent = complexNumber.realComponent;
}

std::ostream &operator<<(std::ostream &os, const ComplexNumber &complexNumber) {
    if (complexNumber.imaginaryComponent == 0) {
        os << std::to_string(complexNumber.realComponent);
    } else if (complexNumber.realComponent == 0) {
        os << std::to_string(complexNumber.imaginaryComponent) + "i";
    } else
        os << std::to_string(complexNumber.realComponent) + ((complexNumber.imaginaryComponent < 0) ?
                                                             ("-" + std::to_string(complexNumber.imaginaryComponent) +
                                                              "i") :
                                                             ("+" + std::to_string(complexNumber.imaginaryComponent) +
                                                              "i"));
    return os;
}

已解决

问题是我使用正则表达式从文件中解析了复数,而且它们非常慢。更换它们后,我设法获得正确的行为。

1 个答案:

答案 0 :(得分:0)

重写:

struct ComplexNumber {
    double real; // *maybe* = 0
    double imaginary; // *maybe* = 0

    ComplexNumber( double r, double i ):real(r), imaginary(i) {}

    ComplexNumber() = default;
    ComplexNumber(const ComplexNumber &complexNumber) = default;
    ComplexNumber& operator=(const ComplexNumber &complexNumber) = default;

};
std::ostream &operator<<(std::ostream &os, const ComplexNumber &complexNumber);

<<可能很慢,不需要成为朋友。停止使用访问器(尤其是非不可访问的访问器)来访问您的字段。

如果您确实需要访问器,请至少将它们内联并放在标题中。但是在这里,它们毫无意义。

即使我不需要operator+之类的东西,我也会写它们,因为为什么呢?

struct ComplexNumber {
    double real; // *maybe* = 0
    double imaginary; // *maybe* = 0

    ComplexNumber( double r, double i ):real(r), imaginary(i) {}

    ComplexNumber() = default;
    ComplexNumber(const ComplexNumber &complexNumber) = default;
    ComplexNumber& operator=(const ComplexNumber &complexNumber) = default;

  ComplexNumber& operator+=( ComplexNumber const& o )& {
    real += o.real;
    imaginary += o.imaginary;
    return *this;
  }
  ComplexNumber& operator-=( ComplexNumber const& o )& {
    real -= o.real;
    imaginary -= o.imaginary;
    return *this;
  }
  ComplexNumber& operator*=( ComplexNumber const& o )& {
    ComplexNumber r{ real*o.real - imaginary*o.imaginary, real*o.imaginary + imaginary*o.real };
    *this = r;
    return *this;
  }

  friend ComplexNumber operator+( ComplexNumber lhs, ComplexNumber const& rhs ) {
    lhs += rhs;
    return lhs;
  }
  friend ComplexNumber operator-( ComplexNumber lhs, ComplexNumber const& rhs ) {
    lhs -= rhs;
    return lhs;
  }
  friend ComplexNumber operator*( ComplexNumber lhs, ComplexNumber const& rhs ) {
    lhs *= rhs;
    return lhs;
  }
};

这是脑残的样板,但至少没有这些,我无法证明拥有ComplexNumber类型。 (我遗漏了/,因为关于如何处理被零除的重要决定仍然存在。)

无论如何,一旦我们不再隐藏从工作代码中访问数据的方式,优化器现在就有机会进行实际优化。