Question

目前，我正在开发一种涉及跟踪人类的计算机视觉应用程序。我想为将要记录在此项目中的视频构建地面实况元数据。元数据可能需要手工标记，并且主要包括图像中人类的位置。我想使用元数据来评估算法的性能。

我当然可以建立一个标签工具，例如qt和/或opencv，但我想知道是否有某种事实上的标准。我遇到了Viper，但它看起来已经死了，并不像我希望的那样容易。除此之外，我找不到多少。

这里是否有人建议使用哪种软件/标准/方法进行标签和评估？我的主要偏好是去面向c ++，但这不是一个严格的约束。

亲切的问候和提前谢谢！汤姆

Answer 1

我又看了vatic并让它发挥作用。它是一个在线视频注释工具，用于通过商业服务进行众包，并在Linux上运行。但是，还有离线模式。在此模式下，不需要用于利用此软件的服务，并且软件独立运行。

附带的README文件中详细描述了安装。除其他外，它涉及设置appache和mysql服务器，一些python包，ffmpeg。如果您按照自述文件进行操作并不困难。（我提到我的代理有一些问题，但这与这个软件包没有关系。）

您可以尝试在线演示。默认输出如下：

0 302 113 319 183 0 1 0 0 "person"
0 300 112 318 182 1 1 0 1 "person"
0 298 111 318 182 2 1 0 1 "person"
0 296 110 318 181 3 1 0 1 "person"
0 294 110 318 181 4 1 0 1 "person"
0 292 109 318 180 5 1 0 1 "person"
0 290 108 318 180 6 1 0 1 "person"
0 288 108 318 179 7 1 0 1 "person"
0 286 107 317 179 8 1 0 1 "person"
0 284 106 317 178 9 1 0 1 "person"

每行包含10+列，以空格分隔。该这些列的定义是：

1   Track ID. All rows with the same ID belong to the same path.
2   xmin. The top left x-coordinate of the bounding box.
3   ymin. The top left y-coordinate of the bounding box.
4   xmax. The bottom right x-coordinate of the bounding box.
5   ymax. The bottom right y-coordinate of the bounding box.
6   frame. The frame that this annotation represents.
7   lost. If 1, the annotation is outside of the view screen.
8   occluded. If 1, the annotation is occluded.
9   generated. If 1, the annotation was automatically interpolated.
10  label. The label for this annotation, enclosed in quotation marks.
11+ attributes. Each column after this is an attribute.

但也可以在xml，json，pickle，labelme和pascal voc中提供输出

所以，总而言之，这确实是我想要的，而且它也很容易使用。我仍然对其他选择感兴趣！

Answer 2

LabelMe是另一个开放的注释工具。我认为它不太适合我的特殊情况，但仍值得一提。它似乎是针对Blob标签。

Answer 3

这是所有计算机视觉从业者都面临的问题。如果您对此非常认真，那么有一家公司会通过众包来为您服务。不过，我不知道是否应该在这个网站上添加一个链接。

Answer 4

我在寻找用于图像标注的工具以构建用于图像分析的训练模型的地面实况数据集时遇到了同样的问题。

如果您的注释需要多边形轮廓，则LabelMe是一个可靠的选项。我之前使用它并且它可以很好地完成工作，并且在3D特征提取方面有一些额外的很酷的功能。除了LabelMe之外，我还制作了一个名为LabelD的开源工具。如果您仍在寻找工具来进行注释，请查看它！

计算机视觉的地面实况数据收集和评估

4 个答案: