我有不同旋转方向的图像。我想使用互相关最大化找到正确的旋转角度。由于我的图片集很大,我想使用mex文件here加快normxcorr2
功能。
我使用以下代码计算matched_angle:
function [matched_angle, max_corr_vecq, matched_angle_mex, max_corr_vecq_mex] = get_correct_rotation(moving, fixed)
for theta = 360:-10:10
rotated = imrotate(moving, theta,'bicubic','crop');
corr2d_map = normxcorr2(double(rotated), double(fixed));
corr2d_map_mex = normxcorr2_mex(double(rotated), double(fixed),'full');
[max_corr_vec(theta/10), ~] = max(corr2d_map(:));
[max_corr_vec_mex(theta/10), ~] = max(corr2d_map_mex(:));
end
% Interpolate correlation max vector for half degree resolution
max_corr_vecq = interp1(10:10:360, max_corr_vec, 0.5:0.5:360, 'spline');
[~, matched_angle] = max(max_corr_vecq);
matched_angle = 0.5 * matched_angle;
% Interpolate correlation max vector for half degree resolution
max_corr_vecq_mex = interp1(10:10:360, max_corr_vec_mex, 0.5:0.5:360, 'spline');
[~, matched_angle_mex] = max(max_corr_vecq_mex);
matched_angle_mex = 0.5 * matched_angle_mex;
end
然而,对于两个不同的normxcorr2
&同样使用这两个相同的图像(Moving Template Image& Fixed Reference Image) normxcorr2_mex
给出完全不同的结果。
plot(0.5:0.5:360, max_corr_vecq, 'linewidth',2); hold on;
plot(0.5:0.5:360, max_corr_vecq_mex, 'linewidth',2);
legend({'MATLAB Built-in', 'MEX'});
set(gca, 'FontSize', 14, 'FontWeight', 'bold');
请参阅Result Plot。
有没有人知道发生了什么?我找不到关于该mex文件准确性的任何条目。据作者说:
以下是等效的:
result = normxcorr2_mex(template, image, 'full');
和
result = normxcorr2(template, image);
除了normxcorr2_mex沿边界'无效'区域有0'
在我的情况下应该不是问题。因为我只检查最大相关值。
答案 0 :(得分:1)
自从我上次回答以来,我发现normcorr2_mex library始终比我的用例要慢(比MATLAB慢)并且在 all 中都是不正确的。
由于我确实需要C ++实现(可以使用MATLAB进行验证),因此我创建了自己的实现。代码在这里列出:
/* normxcorr2_mex.cpp
*
* A MATLAB-mex wrapper around a C/C++ implementation of the Normalised Cross Correlation algorithm described
* by @dafnahaktana in https://stackoverflow.com/questions/44591037/speed-up-calculation-of-maximum-of-normxcorr2.
*
* This module uses the 'integral image' data structure described in the posted MATLAB/Octave code (based upon the
* original Industrial Light & Magic paper at http://scribblethink.org/Work/nvisionInterface/nip.pdf), but replaces
* the "naive" correlation step with a Fourier transform implementation for larger template sizes.
*
* Daniel Eaton released a MATLAB-mex library (http://www.cs.ubc.ca/research/deaton/remarks_ncc.html) with the
* same function name as this one in 2013. Indeed, I acknowledge [and flatteringly plagiarise] his interface and
* naming convention. Unfortunaly, I was unable to duplicate the speed (wrt MATLABs normxcorr2) improvements he
* claimed with the image sizes I required. Curiously, I also observed different results using his library compared
* with MATLABs built-in function (despite being claimed to be identical). This was also noted by others here:
* https://stackoverflow.com/questions/48641648/different-results-of-normxcorr2-and-normxcorr2-mex. This module
* does match normxcorr2 on both the MATLAB R2016b and R2017a/b versions tested, using the (accompanying) test script.
* Like Daniel's module, however, this function returns only the 'valid' region of correlation values, i.e. it
* doesn't pad the output array to match the input image size.
*
* This function is called via:
* NCC = normxcorr2_mex (TEMPLATE, A);
* Where:
* TEMPLATE - The (double precision) matrix to correlate with A.
* A - (Double precision) input matrix for correlation with the TEMPLATE. Note size(A) > size(TEMPLATE).
* NCC - is the computed normalised cross correlation coefficients of the matrices TEMPLATE and A.
* The size of the correlation coefficient matrix is given as:
*
* size(NCC) = [(Ar - TEMPLATEr + 1), (Ac - TEMPLATEc + 1)] ; where:
*
* Ar, Ac and TEMPLATEr, TEMPLATEc are the number of (rows, cols) of A and TEMPLATE respectively.
*
* This module requires the Eigen C++ library (http://eigen.tuxfamily.org/index.php?title=Main_Page) for compilation
* and may be compiled within MATLAB via:
*
* mex -I'[Path to]\eigen-3.3.5' normxcorr2_mex.cpp
*
* Since NCC is such a computationally intensive task, this module may be linked against the openMP library to exploit a
* pool of worker threads and distribute some of the embarrassingly parellel operations within across a number of CPU cores.
* Only rudimentary use is made of the library, but the following compilation option provides speedups generally
* exceeding 50%:
*
* mex -I'[Path to]\eigen-3.3.5' CXXFLAGS="$CXXFLAGS -fopenmp" LDFLAGS="$LDFLAGS -fopenmp" normxcorr2_mex.cpp
*
*
* You are free to do with this code as you wish. For this reason, it is released under the UNLICENSE model:
*
* This is free and unencumbered software released into the public domain.
*
* Anyone is free to copy, modify, publish, use, compile, sell, or
* distribute this software, either in source code form or as a compiled
* binary, for any purpose, commercial or non-commercial, and by any
* means.
*
* In jurisdictions that recognize copyright laws, the author or authors
* of this software dedicate any and all copyright interest in the
* software to the public domain. We make this dedication for the benefit
* of the public at large and to the detriment of our heirs and
* successors. We intend this dedication to be an overt act of
* relinquishment in perpetuity of all present and future rights to this
* software under copyright law.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
* OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*
* For more information, please refer to <http://unlicense.org/>
*/
#include "mex.h"
#include <cstring>
#include <algorithm>
#include <limits>
#include <vector>
#include <cmath>
#include <complex>
#include <iostream>
#include <Eigen/Core>
#include <unsupported/Eigen/FFT>
using namespace Eigen;
// If we're compiled/linked with openMP, turn off Eigen's parallelisation
#ifdef _OPENMP
#define EIGEN_DONT_PARALLELIZE
#define EIGEN_NO_DEBUG
#endif
// For very small input templates, performing the raw 2D correlation in the spatial domain may be faster than
// the transform domain (due to the overhead that the latter involves). The decision which approach to use is
// made at runtime by comparing the size (=rows*cols) of the input TEMPLATE matrix with the following constant.
// Feel free to experiment with this value in your own application!
#define TEMPLATE_SIZE_THRESHOLD 401
// 2D Cross-correlation performed via the "naive approach" (laborious spatial domain convolution).
ArrayXXd spatialXcorr (const Ref<const ArrayXXd>& img, const Ref<const ArrayXXd>& templ)
{
int32_t r, c;
ArrayXXd xcorr2(img.rows()-templ.rows()+1, img.cols()-templ.cols()+1);
for (r=0; r<(img.rows()-templ.rows()+1); r++)
for (c=0; c<(img.cols()-templ.cols()+1); c++)
xcorr2(r,c) = (templ*img.block(r,c,templ.rows(),templ.cols())).sum();
return(xcorr2);
}
// 2D Cross-correlation performed via Fourier transform
ArrayXXd transformXcorr (const Ref<const ArrayXXd>& img, const Ref<const ArrayXXd>& templ)
{
ArrayXXd xcorr2(img.rows()-templ.rows()+1, img.cols()-templ.cols()+1);
// Copy the input arrays into a matrix the next power-of-2 up in size
int32_t nextPow2r = (int32_t)(pow(2.0, round(0.5+log((double)(img.rows()))/log(2.0))));
int32_t nextPow2c = (int32_t)(pow(2.0, round(0.5+log((double)(img.cols()))/log(2.0))));
MatrixXd imgPwr2 = MatrixXd::Zero(nextPow2r, nextPow2c);
MatrixXd templPwr2 = MatrixXd::Zero(nextPow2r, nextPow2c);
// A -> copied to top-left corner.
// TEMPLATE is rotated 180 degrees to account for rotation/flip performed during convolution.
imgPwr2.block(0, 0, img.rows(), img.cols()) = img.matrix();
templPwr2.block(0, 0, templ.rows(), templ.cols()) = (templ.matrix().colwise().reverse()).rowwise().reverse();
// Perform 2D FFTs via sequential 1D transforms (Rows first, then columns)
MatrixXcd imgFT(nextPow2r, nextPow2c), templFT(nextPow2r, nextPow2c), prodFT(nextPow2r, nextPow2c);
// Rows first...
#ifdef _OPENMP // If using parallel threads, then each thread
// must have it's own copy of the eigenFFT plan.
#pragma omp parallel for schedule(dynamic)
for (int32_t r=0; r<nextPow2r; r++) { // This is unnecesary for single-threaded execution as
// each evaluation of the FFT is identical in length
VectorXcd rowVec(nextPow2c); // and data type.
FFT<double> eigenFFT;
// The creation of the plan is computationally expensive
#else // and so we do it once, outside of the loop in the single
// threaded case (to reduce the run time by a factor > 2).
VectorXcd rowVec(nextPow2c);
FFT<double> eigenFFT;
for (int32_t r=0; r<nextPow2r; r++) {
#endif
eigenFFT.fwd(rowVec, imgPwr2.row(r));
imgFT.row(r) = rowVec;
eigenFFT.fwd(rowVec, templPwr2.row(r));
templFT.row(r) = rowVec;
}
// ...then columns.
#ifdef _OPENMP
#pragma omp parallel for schedule(dynamic)
for (int32_t c=0; c<nextPow2c; c++) {
VectorXcd colVec(nextPow2r);
FFT<double> eigenFFT;
#else
VectorXcd colVec(nextPow2r);
for (int32_t c=0; c<nextPow2c; c++) {
#endif
eigenFFT.fwd(colVec, imgFT.col(c));
imgFT.col(c) = colVec;
eigenFFT.fwd(colVec, templFT.col(c));
templFT.col(c) = colVec;
}
// Mutliply complex Fourier domain matricies
prodFT = imgFT.cwiseProduct(templFT);
// Transform (complex) Fourier product back -> (real) spatial domain (2D IFFT).
// Reuse templPwr2 as the output variable for efficiency.
// Rows first (again)...
#ifdef _OPENMP
#pragma omp parallel for schedule(dynamic)
for (int32_t r=0; r<nextPow2r; r++) {
FFT<double> eigenFFT;
VectorXcd rowVec(nextPow2c);
#else
for (int32_t r=0; r<nextPow2r; r++) {
#endif
eigenFFT.inv(rowVec, prodFT.row(r));
prodFT.row(r) = rowVec;
}
// ...and lastly, columns.
#ifdef _OPENMP
#pragma omp parallel for schedule(dynamic)
for (int32_t c=0; c<nextPow2c; c++) {
FFT<double> eigenFFT;
VectorXcd colVec(nextPow2r);
#else
for (int32_t c=0; c<nextPow2c; c++) {
#endif
eigenFFT.inv(colVec, prodFT.col(c));
templPwr2.col(c) = colVec.real();
}
// Extract the valid region of correlation coefficients
xcorr2 = templPwr2.array().block(templ.rows()-1, templ.cols()-1, img.rows()-templ.rows()+1, img.cols()-templ.cols()+1);
return(xcorr2);
}
// Normalised cross-correlation top-level function
ArrayXXd normxcorr2 (const Ref<const ArrayXXd>& templ, const Ref<const ArrayXXd>& img)
{
ArrayXXd templZMean(templ.rows(), templ.cols());
ArrayXXd scalingCoeffs(img.rows() - templ.rows() +1, img.cols() - templ.cols() +1);
ArrayXXd normxcorr(img.rows()-templ.rows()+1, img.cols()-templ.cols()+1);
ArrayXXd integralImg(img.rows()+2, img.cols()+2), integralImgSq(img.rows()+2, img.cols()+2);
ArrayXXd windowMeanA = ArrayXXd::Zero(img.rows() - templ.rows() +1, img.cols() - templ.cols() +1);
ArrayXXd windowMeanASq = ArrayXXd::Zero(img.rows() - templ.rows() +1, img.cols() - templ.cols() +1);
// Calculate the standard deviation of the TEMPLATE
double templSizeRcp = 1.0/(double)(templ.rows()*templ.cols());
templZMean = templ-templ.mean();
double templateStd = sqrt((templZMean.pow(2)).sum()*templSizeRcp);
// Compute mean and standard deviation of input matrix A over the template window size. Firsly...
// Construct array for computing the integral image(s) + zero pad the edges to avoid boundary issues
integralImg.block(0, 0, 1, integralImg.cols()) = ArrayXXd::Zero(1, integralImg.cols());
integralImg.block(0, 0, integralImg.rows(), 1) = ArrayXXd::Zero(integralImg.rows(), 1);
integralImg.block(0, integralImg.cols()-1, integralImg.rows(), 1) = ArrayXXd::Zero(integralImg.rows(), 1);
integralImg.block(integralImg.rows()-1, 0, 1, integralImg.cols()) = ArrayXXd::Zero(1, integralImg.cols());
integralImgSq.block(0, 0, 1, integralImgSq.cols()) = ArrayXXd::Zero(1, integralImgSq.cols());
integralImgSq.block(0, 0, integralImgSq.rows(), 1) = ArrayXXd::Zero(integralImgSq.rows(), 1);
integralImgSq.block(0, integralImgSq.cols()-1, integralImgSq.rows(), 1) = ArrayXXd::Zero(integralImgSq.rows(), 1);
integralImgSq.block(integralImgSq.rows()-1, 0, 1, integralImgSq.cols()) = ArrayXXd::Zero(1, integralImgSq.cols());
// Calculate cumulative sum. Along the length of each row first...
for (int32_t r=0; r<img.rows(); r++) {
double sum = 0.0;
double sumSq = 0.0;
for (int32_t c=0; c<img.cols(); c++) {
sum += img(r,c);
sumSq += (img(r,c)*img(r,c));
integralImg(r+1, c+1) = sum;
integralImgSq(r+1, c+1) = sumSq;
}
}
// ...and then down each column.
for (int32_t c=1; c<=img.cols(); c++) {
double sum = 0.0;
double sumSq = 0.0;
for (int32_t r=1; r<=img.rows(); r++) {
sum += integralImg(r,c);
sumSq += integralImgSq(r,c);
integralImg(r,c) = sum;
integralImgSq(r,c) = sumSq;
}
}
// Determine start/finish indexes for the boundaries of the summed area
int32_t rStart = (int32_t)(0.5 + templ.rows()/2.0);
int32_t rEnd = img.rows() - rStart + (templ.rows() % 2);
int32_t cStart = (int32_t)(0.5 + templ.cols()/2.0);
int32_t cEnd = img.cols() - cStart + (templ.cols() % 2);
// Evaluate the sum of intensities
windowMeanA += ( integralImg.block(templ.rows(), templ.cols(), rEnd-rStart+1, cEnd-cStart+1) \
- integralImg.block(templ.rows(), 0, rEnd-rStart+1, cEnd-cStart+1) \
- integralImg.block(0, templ.cols(), rEnd-rStart+1, cEnd-cStart+1) \
+ integralImg.block(0, 0, rEnd-rStart+1, cEnd-cStart+1) )*templSizeRcp;
// Evaluate the sum of intensities (squared)
windowMeanASq += ( integralImgSq.block(templ.rows(), templ.cols(), rEnd-rStart+1, cEnd-cStart+1) \
- integralImgSq.block(templ.rows(), 0, rEnd-rStart+1, cEnd-cStart+1) \
- integralImgSq.block(0, templ.cols(), rEnd-rStart+1, cEnd-cStart+1) \
+ integralImgSq.block(0, 0, rEnd-rStart+1, cEnd-cStart+1) )*templSizeRcp;
// Calculate the standard deviation (squared) of A over the template size window
// Standard deviation = sqrt(windowMeanASq - windowMeanA.square());
scalingCoeffs = (windowMeanASq - windowMeanA.square());
// Amalgamate the element-by-element test/square root with other coefficients scaling for efficiency
for (int32_t r=0; r<scalingCoeffs.rows(); r++)
for (int32_t c=0; c<scalingCoeffs.cols(); c++)
if (scalingCoeffs(r,c) > 0)
scalingCoeffs(r,c) = templSizeRcp/(templateStd*sqrt(scalingCoeffs(r,c)));
else
scalingCoeffs(r,c) = std::numeric_limits<double>::quiet_NaN();
// Decide which 2D correlation approach to use (transform or spatial domain)
if ((templ.rows()*templ.cols()) > TEMPLATE_SIZE_THRESHOLD)
normxcorr = scalingCoeffs*transformXcorr(img, templZMean);
else
normxcorr = scalingCoeffs*spatialXcorr(img, templZMean);
return(normxcorr);
}
// ******************** Minimal MEX wrapper ********************
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
// Check the number of arguments
if (nrhs != 2)
mexErrMsgIdAndTxt("MATLAB:normxcorr2_mex", "Usage: NCC = normxcorr2_mex (TEMPLATE, A);");
// Verify input array sizes
size_t rowsTempl = mxGetM(prhs[0]);
size_t colsTempl = mxGetN(prhs[0]);
size_t rowsA = mxGetM(prhs[1]);
size_t colsA = mxGetN(prhs[1]);
if ((rowsA <= rowsTempl) || (colsA <= colsTempl))
mexErrMsgIdAndTxt("MATLAB:normxcorr2_mex", "Size of TEMPLATE must be less than input matrix A.");
#ifdef _OPENMP
// Required for Eigen versions < 3.3 and for *some* non-compliant C++11 compilers.
// (Warn Eigen our application might be calling it from multiple threads).
initParallel();
#endif
// Perform correlation
ArrayXXd xcorr(rowsA-rowsTempl+1, colsA-colsTempl+1);
xcorr = normxcorr2 (Map<ArrayXXd>(mxGetPr(prhs[0]), rowsTempl, colsTempl), Map<ArrayXXd>(mxGetPr(prhs[1]), rowsA, colsA));
// Return data to MATLAB
plhs[0] = mxCreateDoubleMatrix(rowsA-rowsTempl+1, colsA-colsTempl+1, mxREAL);
Map<ArrayXXd> (mxGetPr(plhs[0]), xcorr.rows(), xcorr.cols()) = xcorr;
return;
}
根据标题中的注释,将文件保存到normxcorr2_mex.cpp
并使用以下命令进行编译:
mex -I'[Path to]\eigen-3.3.5' normxcorr2_mex.cpp
的
单线程操作,或使用mex -I'[Path to]\eigen-3.3.5' CXXFLAGS="$CXXFLAGS -fopenmp" LDFLAGS="$LDFLAGS -fopenmp"
normxcorr2_mex.cpp
用于多线程openMP支持。可以使用以下MATLAB脚本来验证代码的时间安排和正确操作:
% testHarness.m
%
% Verify the results of the compiled normxcorr2_mex() function against
% MATLABs inbuilt normxcorr2() function. This takes aaaaages to run!
%% Simulation/comparison parameters
nRunsA = 50; % Number of trials for accuracy comparison
nRunsT = 30; % Number of repetitions for execution time detemination
nStepsT = 50; % Number of input matrix size steps to take in execution time measurement
maxImSize = [1343 1745]; % (Deliberately non-round-number) maximum image size for tests
maxTemplSize = [248 379]; % Maximum image template size
%% Accuracy comparison
sumSqErr = zeros(1, nRunsA);
fprintf(2, 'Accuracy comparison\n');
for nRun = 1:nRunsA
fprintf('Run %d (of %d)\n', nRun, nRunsA);
% Create input images/templates of random content and size
randSizeScale = 0.02 + 0.98*rand(1, 2);
img = rand(round(maxImSize.*randSizeScale));
templ = rand(round(maxTemplSize.*randSizeScale));
% MATLABs inbuilt function
resultMatPadded = normxcorr2(templ, img);
% Remove unwanted padding
[rTempl, cTempl] = size(templ);
[rImg, cImg] = size(img);
resultMat = resultMatPadded(rTempl:rImg, cTempl:cImg);
% MEX function
resultMex = normxcorr2_mex(templ, img);
% Compare results
sumSqErr(nRun) = sum(sum( (resultMat-resultMex).^2 ));
end
figure;
plot(sumSqErr);
title('Accuracy comparison between MATLAB and MEX normxcorr2');
xlabel('Run #');
ylabel('\Sigma |MATLAB-MEX|^2');
grid on;
%% Timing comparison
avMatT = zeros(1, nStepsT);
avMexT = zeros(1, nStepsT);
fprintf(2, 'Timing comparison\n');
for stp = 1:nStepsT
fprintf('Run %d (of %d)\n', stp, nStepsT);
% Create input images/templates of random content and progressively larger size
img = rand(round(maxImSize*stp/nStepsT));
templ = rand(round(maxTemplSize.*stp/nStepsT));
% MATLABs function
tStart = tic;
for exec = 1:nRunsT
dummy = normxcorr2(templ, img);
end
avMatT(stp) = toc(tStart)/nRunsT;
% MEX function
tStart = tic;
for exec = 1:nRunsT
dummy = normxcorr2_mex(templ, img);
end
avMexT(stp) = toc(tStart)/nRunsT;
end
figure;
plot((1:nStepsT)/(0.01*nStepsT), avMatT, 'rx-', (1:nStepsT)/(0.01*nStepsT), avMexT, 'bo-');
title('Execution time comparison between MATLAB and MEX normxcorr2');
xlabel('Input array size [% of maximum]');
ylabel('Evaluation time [s]');
legend('MATLAB', 'MEX');
grid on;
上述C ++ / mex实现和MATLAB的内置normxcorr2
函数agree达到了接近基本双精度数据类型极限的水平。事实证明,即使this在我的i7-980 CPU上运行时,最新的MATLAB normxcorr2
甚至在使用openMP时,速度也很难击败。
答案 1 :(得分:-1)
不幸的是,我没有任何解释,但可以确认问题似乎出在库而不是您的实现上。我在Windows下使用MinGW64编译器构建normxcorr2_mex
库时遇到了问题,这使我对构建之间的可能变化感到担心。与下图included here所示,与MATLAB内置的normxcorr2
函数相比,在Debian Linux和Windows下的构建都表现出相同(不正确)的行为。
为帮助其他人在Windows下构建该库,我不得不使用以下命令行强制C ++编译器:
mex -O CXXFLAGS="$CXXFLAGS -std=c++03 -fpermissive" normxcorr2_mex.cpp cv_src/*.cpp
顺便说一句,我还发现mex的实现要比MATLAB慢一个数量级!