使用Apple Accelerate Framework vForce库来提高性能

时间:2016-04-01 15:43:28

标签: c++ performance accelerate-framework

我已经成功实施了Apple的Accelerate Framework中的BLAS库,以提高我的基本矢量和矩阵运算的性能。

对此感到满意,我把注意力转向vForce来矢量化我的基本数学函数。与天真的实现(使用自动编译器优化-Os)相比,我有点惊讶于性能相当差。

作为一个简单的基准测试,我运行了以下测试:Matrix是基本的Matrix类型,使用双指针,AccelerateMatrix是Matrix的子类,它使用vForce中的取幂函数:

Matrix A(vec_size);
AccelerateMatrix B(vec_size);
for (int i=0; i<vec_size;i++ ) {
    A[i] = i;
    B[i] = i;
}

double elapsed_time;

clock_t start = clock();
for(int i=0;i<reps;i++){
    A.exp();
    A.log();
}
clock_t stop = clock();

elapsed_time = (double)(stop-start)/CLOCKS_PER_SEC/reps;

cerr << "Basic matrix exponentiation/log time = " << elapsed_time << endl;


start = clock();
for(int i=0;i<reps;i++){
    B.exp();
    B.log();
}
stop = clock();

elapsed_time = (double)(stop-start)/CLOCKS_PER_SEC/reps;

cerr << "Accelerate matrix exponentiation/log time = " << elapsed_time << endl;

exponentiate / log成员函数实现如下:

void AccelerateMatrix::exp(){
   int size =(int)this->getSize();
   this->goToStart();
   vvexp(this->ptr, this->ptr, &size);}

void Matrix::exp(){
    double *ptr = data;
    while (!atEnd()) {
        *ptr = std::exp(*ptr);
        ptr++;
    }
}

data是指向double数组的第一个元素的指针。

以下是表现的输出:

矩阵元素数= 1000000

基本矩阵求幂/对数时间(秒)= 0.0089806

加速矩阵取幂/对数时间(秒)= 0.0149955

我在发布模式下从XCode运行。 我的处理器是2.3 GHz Intel Core i7。 内存为8 GB 1600 MHz DDR3。

1 个答案:

答案 0 :(得分:0)

It appears the issue is to do with how vForce manipulates memory. Essentially it is not good at handling large matrices in one go. For vec_size = 1000; vForce computes the exponential/log twice as fast as the compiler optimised, naive version. I broke the larger example vec_size = 1000000 up into batches of 1000 each, and lo and behold, the vForce implementation was twice as fast as the naive one. Nice!