Question

我正在尝试使用KTH数据集执行一些基本的动作识别。

我正在使用UCF link中的3DSIFT功能提取器。从给定的x，y和z坐标中提取SIFT描述符。

对于特征检测，我使用的是选择性STIPS link，它已被证明对动作识别非常有效。根据作者提供的源代码，它产生以下输出：

    @output : corner_points, P X 4 matrix, where P is the number of interest
%           point found in the image_stack and each interest point contains
%           4 values :: [X,Y] coordinate of the interest point, frame
%           number, scale at which it is detected.

我是否正确地假设此处提供的帧编号也是3DSIFT所需的Z坐标？

我从视频片段中提取了STIPS并获得了所需的输出，但我在每个帧上获得了多个X和Y值：

[71,24,1]
[54,26,1]
[86,29,1]
...
..
.

这是SIFT3D的预期输出和接受输入吗？

Answer 1

是的，从我通过3dsift Z可以看出，在处理视频时相当于帧数。因此，从stips输出的x，y，frame应该作为3dsift的x，y，z输入。

在视频处理中，z坐标是帧数吗？

1 个答案: