我读了Viola和Jones的论文。 他们在论文中明确指出,他们的算法比其他算法更快,因为通过缩放特征矩形来避免图像金字塔的计算。
但是我搜索了很长时间,却发现OpenCV实现了图像金字塔方法而不是缩放特征矩形。并且为金字塔中的所有子图像计算积分图像。如果使用该算法处理视频而不是图像,则每帧都会这样做。
这个选择的理由是什么?我不太明白。
我所能理解的完全相反:对于视频应用,只需要进行一次缩放功能,并且所有帧都可以重复使用缩放功能。并且只需要计算整个图像的积分图像。
我对此是否正确?
Viola和Jones也在奔腾3计算机上提供了15fps的帧速率,但我很难看到有人在现代计算机上实现OpenCV的性能。这很奇怪,不是吗?
任何输入都会有所帮助。谢谢。
答案 0 :(得分:0)
我试图通过查看他们的代码来验证这一点。这基于版本2.4.10。简短的回答是:两者。 OpenCv根据执行检测的比例因子对图像进行缩放,并且还可以根据比例因子对不同窗口大小的特征进行重新缩放。理由如下:
1.查看较旧的函数,来自objdetect模块(haar.cpp)的cvHaarDetectObjectsForROC。值得注意的参数是CvSize minSize,CvSize maxSize和const CvArr * _img,double scaleFactor,int minNeighbors。
CvSeq*
cvHaarDetectObjectsForROC( const CvArr* _img,
CvHaarClassifierCascade* cascade, CvMemStorage* storage,
std::vector<int>& rejectLevels, std::vector<double>& levelWeights,
double scaleFactor, int minNeighbors, int flags,
CvSize minSize, CvSize maxSize, bool outputRejectLevels )
{
CvMat stub, *img = (CvMat*)_img;
.... // skip a bit ahead to this part
if( flags & CV_HAAR_SCALE_IMAGE )
{
CvSize winSize0 = cascade->orig_window_size; // this would be the trained size of 24x24 pixels mentioned in the paper
for( factor = 1; ; factor *= scaleFactor )
{
// detection window for current scale
CvSize winSize = { cvRound(winSize0.width*factor), cvRound(winSize0.height*factor) };
//resized image size
CvSize sz = { cvRound( img->cols/factor ), cvRound( img->rows/factor ) };
// take every possible scale factor as long as the resulting window doesn't exceed the maximum size given and is bigger than the minimum one
if( winSize.width > maxSize.width || winSize.height > maxSize.height )
break;
if( winSize.width < minSize.width || winSize.height < minSize.height )
continue;
img1 = cvMat( sz.height, sz.width, CV_8UC1, imgSmall->data.ptr );
... // skip sum, square sum, tilted sums a.k.a interal image arrays initialization
cvResize( img, &img1, CV_INTER_LINEAR ); // scaling down the image here
cvIntegral( &img1, &sum1, &sqsum1, _tilted ); // compute integral representation for the scaled down version
... //skip some lines
cvSetImagesForHaarClassifierCascade( cascade, &sum1, &sqsum1, _tilted, 1. ) //-> set the structures and also rescales the feature according to the last parameter which is the scale factor.
// Notice it is 1.0 because the image was scaled down this time.
<call detection function with notable arguments: cascade,... factor, cv::Mat(&sum1), cv::Mat(&sqsum1) ...>
// the above call is a parallel for that evaluates a window at a certain position in the image with the cascade classifier
// note the class naming HaarDetectObjects_ScaleImage_Invoker in the actual code and skipped here.
} // end for
} // if
else
{
int n_factors = 0; // total number of factors
cvIntegral( img, sum, sqsum, tilted ); // -> makes a single integral image for the given image (the original one passed in the cvHaarDetectObjects)
// below aims to see the total number of scale factors at which detection is performed.
for( n_factors = 0, factor = 1;
factor*cascade->orig_window_size.width < img->cols - 10 &&
factor*cascade->orig_window_size.height < img->rows - 10;
n_factors++, factor *= scaleFactor );
... // skip some lines
for( ; n_factors-- > 0; factor *= scaleFactor )
{
CvSize winSize = { cvRound( cascade->orig_window_size.width * factor ), cvRound( cascade->orig_window_size.height * factor )};
... // skip check for minSize and maxSize here
cvSetImagesForHaarClassifierCascade( cascade, sum, sqsum, tilted, factor ); // -> notice here the scale factor is given so that the trained Haar features can be rescaled.
<parallel for detect call given a startX, endX and startY endY, window size and cascade> // Note the name here HaarDetectObjects_ScaleCascade_Invoker used in actual code and skipped here
}
} // end of if
... // skip rest
} // end of cvHaarDetectObjectsForROC function
如果您使用新的API(C ++)类CascadeClassifier,如果它加载traincascade.exe应用程序输出的级联的新.xml格式将根据比例因子缩放图像(对于Haars它应该从我所知道的来看。该类的detectMultiScale方法将在代码中的某个位置默认为detectSingleScale方法:
if( !detectSingleScale( scaledImage, stripCount, processingRectSize, stripSize, yStep, factor, candidates, rejectLevels, levelWeights, outputRejectLevels ) )
break; // from cascadedetect.cpp in the detectMultiScale method.
我可以想到的可能原因:为了在C ++中实现统一设计,这是唯一可以通过单个界面实现不同类型功能透明度的方法。
我离开了思路,以防我理解错误或遗漏了其他用户可以通过验证此痕迹来纠正我的内容。