有人可以在此函数内建议任何更快速的方法将矩阵乘以矢量吗?
inline void multiply(
std::vector< std::vector<double> > &matrix,
std::vector<double> &vector,
std::vector<double> &result
){
int size = (int) vector.size();
result.resize(size);
#pragma omp parallel for
for(int i = 0; i < size; ++i){
int j = 0;
for(; j <= size - 16; j += 16){
result[i] += matrix[i][j] * vector[j]
+ matrix[i][j + 1] * vector[j + 1]
+ matrix[i][j + 2] * vector[j + 2]
+ matrix[i][j + 3] * vector[j + 3]
+ matrix[i][j + 4] * vector[j + 4]
+ matrix[i][j + 5] * vector[j + 5]
+ matrix[i][j + 6] * vector[j + 6]
+ matrix[i][j + 7] * vector[j + 7]
+ matrix[i][j + 8] * vector[j + 8]
+ matrix[i][j + 9] * vector[j + 9]
+ matrix[i][j + 10] * vector[j + 10]
+ matrix[i][j + 11] * vector[j + 11]
+ matrix[i][j + 12] * vector[j + 12]
+ matrix[i][j + 13] * vector[j + 13]
+ matrix[i][j + 14] * vector[j + 14]
+ matrix[i][j + 15] * vector[j + 15];
}
for(; j < size; ++j){
result[i] += matrix[i][j] * vector[j];
}
}
}
该函数在运行时被调用了很多次,因此对整个计算时间有非常关键的影响。
答案 0 :(得分:0)
根据您的硬件,使用GPU并行化(例如:CUDA)可能会大有帮助。