Question

让：

[220, 220] = size(M); and [208, 208] = size(N).

以下命令采用~0.025s

 tic;C = conv2(M,N,'valid');toc

[13,13]=size(C);

我想加快速度。所以我从Internet上下载了一些名为Conv_fast的代码，依此类推。

但我测试了一些包括使用FFT。它们不比0.025s快。有些人会0.2s。

那么有没有可能加快它的速度呢？

如上所述，我尝试了以下功能，

function g = conv2FFT(h, f)

% g = conv2FFT(h, f)
%
% DESC:
% computes the 2D convolution via FFT
%
% AUTHOR
% Marco Zuliani - zuliani@ece.ucsb.edu
%
% VERSION:
% 1.0.0
%
% INPUT:
% h                 = convolution kernel
% f                 = input signal
%
% OUTPUT:
% g                 = output signal
%
% HISTORY
% 1.0.0             ??/??/07 Initial version

sh = size(h);
sf = size(f);

% zero pad the input signals
fm = zeros(sf+2*(sh-1), class(f));
o = sh-1;
fm( o(1)+(1:size(f,1)), o(2)+(1:size(f,2)) ) = f;

h_zp = zeros(size(fm), class(h));
h_zp(1:size(h,1), 1:size(h,2)) = h;

% compute the convolution in frequency
F = fft2(fm);
H = fft2(h_zp);
Y = F.*H;

% back to spatial domain
g = real( ifft2(Y) );

% remove padding
o = floor(1.5*size(h))-1;
g = g( o(1)+(1:size(f,1)), o(2)+(1:size(f,2)) );

return

但需要约0.1秒。

现在我使用c ++来完成这项工作。这不是我想要的。但它快一点。需要0.015秒。

这是我的代码。我在使用omp.h时遇到问题。因为无论我是否使用omp.h，运行时间都相似。所以omp.h没有加快代码速度。我有超过1个CPU。有人知道原因吗？

#include "mex.h"
#include <malloc.h>
#include <math.h>
#include<omp.h>
///////////////////////////////////////////////////////////
// compile with: mex CXXFLAGS="\$CXXFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp" mex_convolution.cpp
///////////////////////////////////////////////////////////
// entry function
void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[]) {
    double *input,*output,*filter;
        int I_row,I_col,filter_rown,nRow,filter_coln,nCol,N;

        input = mxGetPr(prhs[0]);
        I_row = mxGetM(prhs[0]);
        I_col = mxGetN(prhs[0]);

        filter = mxGetPr(prhs[1]);
        filter_rown = mxGetM(prhs[1]);
        filter_coln = mxGetN(prhs[1]);
        nRow = I_row-filter_rown+1;
        nCol = I_col-filter_coln+1;
        plhs[0] = mxCreateDoubleMatrix(nRow,nCol,mxREAL);
        output = mxGetPr(plhs[0]);

        //handle central region
        int col,row;
        #pragma omp parallel for
        for (col = 0; col < nCol; col++)
        {
                #pragma omp parallel for
                for (row = 0; row < nRow; row++)
                {
                        int idx = 0;
                        int idx_col,idx_row;
                        double response = 0;


                        #pragma omp parallel for
                        for (int filter_col = 0; filter_col<filter_coln; filter_col++)
                        {
                              #pragma omp parallel for
                                for (int filter_row = 0; filter_row<filter_rown; filter_row++)
                                {
                                        idx_col = col + filter_col;
                                        idx_row = row + filter_row;
                                        response = response + input[idx_row + idx_col*I_row] * filter[idx];
                                        idx++;
                                }
                        }
                        output[row + col*nRow] = response;
                }
        }

}

在matlab中，conv2（M，N，'有效'）具有相似的M和N大小

0 个答案: