Question

我正在挖掘OpenCV's implementation of SIFT descriptor extraction。我发现了一些令人费解的代码来获得兴趣点邻域的半径。下面是带注释的代码，变量名称更改为更具描述性：

// keep octave below 256 (255 is 1111 1111)
int octave = kpt.octave & 255;
// if octave is >= 128, ...????
octave = octave < 128 ? octave : (-128 | octave);
// 1/2^absval(octave)
float scale = octave >= 0 ? 1.0f/(1 << octave) : (float)(1 << -octave);
// multiply the point's radius by the calculated scale
float scl = kpt.size * 0.5f * scale;
// the constant sclFactor is 3 and has the following comment:
// determines the size of a single descriptor orientation histogram
float histWidth = sclFactor * scl;
// descWidth is the number of histograms on one side of the descriptor
// the long float is sqrt(2)
int radius = (int)(histWidth * 1.4142135623730951f * (descWidth + 1) * 0.5f);

据我所知，这与转换到感兴趣点的比例有关（我已阅读过Lowe的论文），但我无法将这些点连接到代码上。具体来说，我不了解前3行和最后一行。

我需要理解这一点，为动作创建一个类似的本地点描述符。

Answer 1

我不明白前3行

实际上，此SIFT实现在KeyPoint octave属性中编码多个值。如果您参考line 439，您可以看到：

kpt.octave = octv + (layer << 8) + (cvRound((xi + 0.5)*255) << 16);

这意味着八度音程存储在第一个字节块内，第二个字节块内的层等等。

所以kpt.octave & 255（可以在unpackOctave方法中找到）只是屏蔽了关键点八度，以检索有效的八度值。

另外：此SIFT实现使用负的第一个八度音程（int firstOctave = -1）来处理更高分辨率的图像。由于八度音阶索引从0开始，因此计算映射：

octave index = 0 => 255
octave index = 1 => 0
octave index = 2 => 1
...

此映射在line 790计算：

kpt.octave = (kpt.octave & ~255) | ((kpt.octave + firstOctave) & 255);

因此，上面的第二行只是一种映射这些值的方法：

octave = 255 => -1
octave = 0   => 0
octave = 1   => 1
..

并且第三行只是一种计算尺度的方法，考虑到负八度音阶给出了比例＆gt; 1，例如1 << -octave为octave = -1提供2，这意味着它的大小加倍。

[我不明白]最后一行。

基本上它对应于包围尺寸为D的平方补丁的圆的半径，因此sqrt(2)和除以2. D的乘积是通过乘以：

关键点比例，
放大系数= 3，
描述符直方图的宽度= 4，向上舍入到下一个整数（因此为+1）

确实，您可以在vlfeat's SIFT implementation中找到详细说明：

每个空间区的支持具有SBP = 3sigma的扩展像素，其中sigma是关键点的比例。所以所有的垃圾箱一起支持SBP x NBP像素宽。以来使用像素的加权和插值，支持延伸另一个半箱。因此，支撑是一个方形窗口 SBP x（NBP + 1）像素。最后，因为补丁可以任意旋转，我们需要考虑一个窗口2W + = sqrt（2）x SBP x（NBP + 1）像素宽。

最后，我强烈建议您参考此vlfeat SIFT documentation。

OpenCV SIFT描述符关键点半径

1 个答案: