让:
[220, 220] = size(M); and [208, 208] = size(N).
以下命令采用~0.025s
tic;C = conv2(M,N,'valid');toc
[13,13]=size(C);
我想加快速度。所以我从Internet上下载了一些名为Conv_fast的代码,依此类推。
但我测试了一些包括使用FFT。它们不比0.025s
快。有些人会0.2s
。
那么有没有可能加快它的速度呢?
如上所述,我尝试了以下功能,
function g = conv2FFT(h, f)
% g = conv2FFT(h, f)
%
% DESC:
% computes the 2D convolution via FFT
%
% AUTHOR
% Marco Zuliani - zuliani@ece.ucsb.edu
%
% VERSION:
% 1.0.0
%
% INPUT:
% h = convolution kernel
% f = input signal
%
% OUTPUT:
% g = output signal
%
% HISTORY
% 1.0.0 ??/??/07 Initial version
sh = size(h);
sf = size(f);
% zero pad the input signals
fm = zeros(sf+2*(sh-1), class(f));
o = sh-1;
fm( o(1)+(1:size(f,1)), o(2)+(1:size(f,2)) ) = f;
h_zp = zeros(size(fm), class(h));
h_zp(1:size(h,1), 1:size(h,2)) = h;
% compute the convolution in frequency
F = fft2(fm);
H = fft2(h_zp);
Y = F.*H;
% back to spatial domain
g = real( ifft2(Y) );
% remove padding
o = floor(1.5*size(h))-1;
g = g( o(1)+(1:size(f,1)), o(2)+(1:size(f,2)) );
return
但需要约0.1秒。
现在我使用c ++来完成这项工作。这不是我想要的。但它快一点。需要0.015秒。
这是我的代码。我在使用omp.h
时遇到问题。因为无论我是否使用omp.h
,运行时间都相似。所以omp.h
没有加快代码速度。我有超过1个CPU。
有人知道原因吗?
#include "mex.h"
#include <malloc.h>
#include <math.h>
#include<omp.h>
///////////////////////////////////////////////////////////
// compile with: mex CXXFLAGS="\$CXXFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp" mex_convolution.cpp
///////////////////////////////////////////////////////////
// entry function
void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[]) {
double *input,*output,*filter;
int I_row,I_col,filter_rown,nRow,filter_coln,nCol,N;
input = mxGetPr(prhs[0]);
I_row = mxGetM(prhs[0]);
I_col = mxGetN(prhs[0]);
filter = mxGetPr(prhs[1]);
filter_rown = mxGetM(prhs[1]);
filter_coln = mxGetN(prhs[1]);
nRow = I_row-filter_rown+1;
nCol = I_col-filter_coln+1;
plhs[0] = mxCreateDoubleMatrix(nRow,nCol,mxREAL);
output = mxGetPr(plhs[0]);
//handle central region
int col,row;
#pragma omp parallel for
for (col = 0; col < nCol; col++)
{
#pragma omp parallel for
for (row = 0; row < nRow; row++)
{
int idx = 0;
int idx_col,idx_row;
double response = 0;
#pragma omp parallel for
for (int filter_col = 0; filter_col<filter_coln; filter_col++)
{
#pragma omp parallel for
for (int filter_row = 0; filter_row<filter_rown; filter_row++)
{
idx_col = col + filter_col;
idx_row = row + filter_row;
response = response + input[idx_row + idx_col*I_row] * filter[idx];
idx++;
}
}
output[row + col*nRow] = response;
}
}
}