我需要做计算机视觉任务才能 etect watter bottle或汽水罐。我将获得瓶子,汽水罐或任何其他随机物品(一个接一个)的“正面”图像,我的算法应确定它是瓶子,罐头还是其中任何一个。
有关对象检测方案的一些细节:
以防万一,每个例子:
我已经测试了几次OpenCV人脸检测算法,我知道它工作得很好但是我需要获得一个特殊的Haar Cascades功能XML文件来检测这种方法中的每个自定义对象。
所以,我想到的不同选择是:
我想得到一个简单的算法,我认为甚至不需要创建自定义Haar分类器。你会建议什么?
我强烈考虑了形状/纵横比方法。
然而,我猜我面临一些问题,因为瓶子各有不同的尺寸甚至形状。 但是这让我想到或设定了以下注意事项:
我取得的成就:
门槛真的对我有帮助,我注意到在白色背景测试中我会获得罐头:
这就是它为瓶子所获得的东西:
因此,较暗的区域保持优势是显而易见的。在罐头中有一些情况可能会变成假阴性。对于瓶子来说,光线和角度可能导致不一致的结果,但我真的认为这可能是一种较短的方法。
所以,我现在很困惑,我应该如何评估黑暗的主导地位,我已经读过findContours
导致它,但我很失落如何抓住这样的功能。例如,在汽水罐的情况下,它可能会发现几个轮廓,所以我对要评估的内容感到迷茫。
注意:我愿意测试与Open CV不同的任何其他算法或库。
答案 0 :(得分:2)
我在这里看到几个基本想法:
答案 1 :(得分:2)
由于你想要识别can vs bottle而不是pepsi vs coke,与Haar和SIFT / SURF / ORB等功能2匹配器相比,形状匹配可能是要走的路。
独特的背景颜色将使事情变得更容易。
首先从仅背景的图像中创建直方图
int channels[] = {0,1,2}; // use all the channels
int rgb_bins = 32; // quantize to 32 colors per channel
int histSize[] = {rgb_bins, rgb_bins, rgb_bins};
float _range[] = {0,255};
float* ranges[] = {_range, _range, _range};
cv::SparseMat bghist;
cv::calcHist(&bg_image, 1, channels, cv::noArray(),bghist, 3, histSize, ranges );
然后使用calcBackProject创建bg的掩码而不是bg
cv::MatND temp_ND;
cv::calcBackProject( &bottle_image, 1, channels, bghist, temp_ND, ranges );
cv::Mat bottle_mask, bottle_backproj;
if( feeling_lazy ){
cv::normalize(temp_ND, bottle_backproj, 0, 255, cv::NORM_MINMAX, CV_8U);
//a small blur here could work nicely
threshold( bottle_backproj, bottle_mask, 0, 255, THRESH_OTSU );
bottle_mask = cv::Scalar(255) - bottle_mask; //invert the mask
} else {
//finding just the right value here might be better than the above method
int magic_threshold = 64;
temp_ND.convertTo( bottle_backproj, CV_8U, 255.);
//I expect temp_ND to be CV_32F ranging from 0-1, but I might be wrong.
threshold( bottle_backproj, bottle_mask, magic_threshold, 255, THRESH_BINARY_INV );
}
然后:
使用matchTemplate将瓶子掩码/ bottle_backproj与几个样品瓶面具/反投影进行比较,并确定其是否匹配。
matchTemplate(bottle_mask, bottle_template, result, CV_TM_CCORR_NORMED);
double confidence; minMaxLoc( result, NULL, &confidence);
或者使用matchShapes,但我从来没有让它正常工作。
double confidence = matchShapes(bottle_mask, bottle_template, CV_CONTOURS_MATCH_I3);
或者使用难以设置的linemod,但对于形状不是很复杂的图像效果很好。除了链接文件,我还没有找到任何这种方法的工作样本,所以这就是我所做的。
首先使用一些样本图像创建/训练检测器
//some magic numbers
std::vector<int> T_at_level;
T_at_level.push_back(4);
T_at_level.push_back(8);
//add some padding so linemod doesn't scream at you
const int T = 32;
int width = bottle_mask.cols;
if( width % T != 0)
width += T - width % T;
int height = bottle_mask.rows;
if( height % T != 0)
height += T - height % T;
//in this case template_backproj is created specifically from a sample bottle_backproj
cv::Rect padded_roi( (width - template_backproj.cols)/2, (height - template_backproj.rows)/2, template_backproj.cols, template_backproj.rows);
cv::Mat padded_backproj = zeros( width, height, template_backproj.type());
padded_backproj( padded_roi ) = template_backproj;
cv::Mat padded_mask = zeros( width, height, template_mask.type());
padded_mask( padded_roi ) = template_mask;
//you might need to erode padded_mask by a few pixels.
//initialize detector
std::vector< cv::Ptr<cv::linemod::Modality> > modalities;
modalities.push_back( cv::makePtr<cv::linemod::ColorGradient>() ); //for those that don't have a kinect
cv::Ptr<cv::linemod::Detector> new_detector = cv::makePtr<cv::linemod::Detector>(modalities, T_at_level);
//add sample images to the detector
std::vector<cv::Mat> template_images;
templates.push_back( padded_backproj);
cv::Rect ignore_me;
const std::string class_id = "bottle";
template_id = new_detector->addTemplate(template_images, class_id, padded_mask, &ignore_me);
然后做一些匹配
std::vector<cv::Mat> sources_vec;
sources_vec.push_back( padded_backproj );
//padded_backproj doesn't need to be the same size as the trained template images, but it does need to be padded the same way.
float matching_threshold = 0.8; //a higher number makes the algorithm faster
std::vector<cv::linemod::Match> matches;
std::vector<cv::String> class_ids;
new_detector->match(sources_vec, matching_threshold, matches,class_ids);
float confidence = matches.size() > 0? matches[0].similarity : 0;
答案 2 :(得分:2)
正如cyriel所说,纵横比(宽度/高度)可能是一个有用的衡量标准。这是一些OpenCV Python代码,可以找到轮廓(希望包括瓶子或罐子的轮廓),并为您提供纵横比和其他一些测量值:
# src image should have already had some contrast enhancement (such as
# cv2.threshold) and edge finding (such as cv2.Canny)
contours, hierarchy = cv2.findContours(src, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for contour in contours:
num_points = len(contour)
if num_points < 5:
# The contour has too few points to fit an ellipse. Skip it.
continue
# We could use area to help determine the type of object.
# Small contours are probably false detections (not really a whole object).
area = cv2.contourArea(contour)
bounding_ellipse = cv2.fitEllipse(contour)
center, radii, angle_degrees = bounding_ellipse
# Let's define an ellipse's normal orientation to be landscape (width > height).
# We must ensure that the ellipse's measurements match this orientation.
if radii[0] < radii[1]:
radii = (radii[1], radii[0])
angle_degrees -= 90.0
# We could use the angle to help determine the type of object.
# A bottle or can's angle is probably approximately a multiple of 90 degrees,
# assuming that it is at rest and not falling.
# Calculate the aspect ratio (width / height).
# For example, 0.5 means the object's height is 2 times its width.
# A bottle is probably taller than a can.
aspect_ratio = radii[0] / radii[1]
为了检查透明度,您可以使用直方图分析或背景减法将图片与已知背景进行比较。
轮廓的力矩可用于确定其质心(重心):
moments = cv2.moments(contour)
m00 = moments['m00']
m01 = moments['m01']
m10 = moments['m10']
centroid = (m10 / m00, m01 / m00)
您可以将其与中心进行比较。如果物体的一端较大(“较重”),则质心将比中心更接近该端。
答案 3 :(得分:1)
所以,我的主要检测方法是:
瓶子是透明的,罐子是不透明的
通常算法包括:
拍摄灰度图片。
应用二进制阈值。
从中选择一个方便的投资回报率。
获取它的颜色均值甚至是标准偏差。
- 醇>
区分。
实施基本上已简化为此功能(之前已定义CAN
和BOTTLE
):
int detector(int x, int y, int width, int height, int thresholdValue, CvCapture* capture) {
Mat img;
Rect r;
vector<Mat> channels;
r = Rect(x,y,width,height);
if ( !capture ) {
fprintf( stderr, "ERROR: capture is NULL \n" );
getchar();
return -1;
}
img = Mat(cvQueryFrame( capture ));
cvtColor(img,img,CV_RGB2GRAY);
threshold(img, img, 127, 255, THRESH_BINARY);
// ROI
Mat roiImage = img(r);
split(roiImage, channels);
Scalar m = mean(channels[0]);
float media = m[0];
printf("Media: %f\n", media);
if (media < thresholdValue) {
return CAN;
}
else {
return BOTTLE;
}
}
可以看出,应用了THRESH_BINARY
阈值,并且使用了纯白色背景。然而,我在整个方法和算法中面临的主要和关键问题是环境中的光度变化,甚至是次要的。
有时我会注意到THRESH_BINARY_INV
可能会有所帮助,但我想知道我是否可以使用一些certian阈值参数,或者应用其他过滤器可能会导致摆脱环境闪电作为一个问题。
我非常欣赏边界框中的宽高比计算方法或查找轮廓,但在条件调整时我发现这很简单直接。
答案 4 :(得分:0)
I'd use deep learning, based on Transfer learning.
The idea is this: given a highly complex well trained neural network, that was trained on a similar classification task (tipically over a large public dataset, like imagenet), you can freeze the majority of its weigths and only train the last layers. There are lots of tutorials out there. You don't need to have a background on deep learning.
There is a tutorial which is almost out of the box with tensorflow here and here there is another based on keras.