Question

我正在the tutorial on Matlab webpage之后在我自己的数据集上尝试使用 RCNN 执行对象检测。根据下图：

我应该将图像路径放在第一列和以下列中每个对象的边界框中。但在我的每个图像中，每种图像都有多个对象。例如，一张图像中有20辆车。我应该怎么处理？我应该为图像中的每个车辆实例创建一个单独的行吗？

Answer 1

网站上的示例找到分数最高的像素邻域，并在图像中的该区域周围绘制一个边界框。当你现在有多个对象时，这会使事情变得复杂。您可以使用两种方法来帮助查找多个对象。

查找所有超过某个全局阈值的分数框。
找到分数最高的边界框，找到超过此阈值百分比的边界框。这个百分比是任意的，但根据经验和我在实践中看到的，人们倾向于选择图像中找到的最大分数的80％到95％。如果你提交一个图像作为查询，并且没有经过训练可以被分类器检测到的对象，这当然会给你误报，但你必须在你的头上实现一些更多的后处理逻辑。

另一种方法是选择一些值k，然后显示与k最高得分相关联的前k个边界框。这当然要求您知道k的值是什么，并且它总是假设您在图像中找到了一个对象，就像第二种方法一样。

除了上述逻辑之外，您所述的方法是正确的，您需要为图像中的每个车辆实例创建单独的行。这意味着如果在单个图像中有多个对象候选者，则需要在每个实例中引入一行，同时保持图像文件名相同。因此，如果您在一个图像中有20个车辆，则需要在表格中创建20行，其中文件名完全相同，并且您将为该图像中的每个不同对象设置单个边界框规范。

完成此操作后，假设您已经训练了R-CNN探测器并且想要使用它，则检测对象的原始代码如下所示：

% Read test image
testImage = imread('stopSignTest.jpg');

% Detect stop signs
[bboxes, score, label] = detect(rcnn, testImage, 'MiniBatchSize', 128)

% Display the detection results
[score, idx] = max(score);

bbox = bboxes(idx, :);
annotation = sprintf('%s: (Confidence = %f)', label(idx), score);

outputImage = insertObjectAnnotation(testImage, 'rectangle', bbox, annotation);

figure
imshow(outputImage)

这仅适用于得分最高的一个对象。如果您想对多个对象执行此操作，则可以使用score方法输出的detect，并查找适应情况1或情况2的位置。

如果您遇到情况1，则会将其修改为如下所示。

% Read test image
testImage = imread('stopSignTest.jpg');

% Detect stop signs
[bboxes, score, label] = detect(rcnn, testImage, 'MiniBatchSize', 128)

% New - Find those bounding boxes that surpassed a threshold
T = 0.7; % Define threshold here
idx = score >= T;

% Retrieve those scores that surpassed the threshold
s = score(idx);

% Do the same for the labels as well
lbl = label(idx);

bbox = bboxes(idx, :); % This logic doesn't change

% New - Loop through each box and print out its confidence on the image
outputImage = testImage; % Make a copy of the test image to write to
for ii = 1 : size(bbox, 1)
    annotation = sprintf('%s: (Confidence = %f)', lbl(ii), s(ii)); % Change    
    outputImage = insertObjectAnnotation(outputImage, 'rectangle', bbox(ii,:), annotation); % New - Choose the right box
end

figure
imshow(outputImage)

请注意，我已将原始边界框，标签和分数存储在其原始变量中，而在单独变量中超出阈值的子集则存储，以防您想要在两者之间进行交叉引用。如果您想要适应情况2，除了定义阈值之外，代码仍然与情况1相同。

来自：

% New - Find those bounding boxes that surpassed a threshold
T = 0.7; % Define threshold here
idx = scores >= T;
% [score, idx] = max(score);

......现在改为：

% New - Find those bounding boxes that surpassed a threshold
perc = 0.85; % 85% of the maximum threshold
T = perc * max(score); % Define threshold here
idx = score >= T;

最终结果将是图像中检测到的对象的多个边界框 - 每个检测到的对象一个注释。

Answer 2

我认为你实际上必须将该图像的所有坐标作为训练数据表中的单个条目。有关详细信息，请参阅此MATLAB tutorial。如果您在本地将训练数据加载到MATLAB并检查[x, y, width, height]变量，您实际上会看到this（抱歉，我的分数不够高，无法直接在我的答案中包含图像）。

总而言之，在训练数据表中，确保每个图像都有一个唯一的条目，然后将多个边界框放入相应的类别中作为矩阵，其中每一行的格式为T => (key1, key2, ...) and U => (value1, value2, ...)

如何在自定义数据集上执行RCNN对象检测？

2 个答案: