编码边界框进行训练

时间:2018-11-14 18:31:27

标签: python keras conv-neural-network object-detection

我正在尝试训练一个基本的卷积网络以定位图像中的对象。我将图像转换为一个形状为480, 640, 3(高,宽,RGB)的numpy数组。目标是带有盒子点的2D numpy数组。它的形状是18, 2(max_boxes,(x,y))。

我将其输入一个相当简单的卷积模型(如果有帮助,我可以发布源代码),该模型擅长对图像进行分类。

当我训练模型时,它很快达到了约0.77的精度,损失高达〜35。此后,它停止学习,并且当我尝试测试图像时,它表现超级好。

这是为什么?我是否对目标数据进行了错误编码,或者在训练网络时出现问题?这样做的“标准”方法是什么?我该如何解决?

示例输出

Raw output:
[[[0.02862035 0.02726501]
  [0.02820634 0.02730229]
  [0.02834299 0.02714084]
  [0.02819964 0.02701551]
  [0.02831725 0.02733742]
  [0.02820959 0.02748821]
  [0.02811676 0.02731069]
  [0.0284538  0.0269504 ]
  [0.02842114 0.02728475]
  [0.02858478 0.02691348]
  [0.02842172 0.0273287 ]
  [0.02865605 0.02686743]
  [0.02795038 0.02733723]
  [0.0287733  0.02693835]
  [0.02838135 0.02726109]
  [0.02785174 0.02761061]
  [0.03038537 0.02532353]
  [0.02849848 0.0269334 ]]]
Formatted:
[[[18.317022   17.44961   ]
  [13.539045   13.105101  ]
  [ 0.02834299  0.02714084]
  [ 0.02819964  0.02701551]
  [ 0.02831725  0.02733742]
  [ 0.02820959  0.02748821]
  [ 0.02811676  0.02731069]
  [ 0.0284538   0.0269504 ]
  [ 0.02842114  0.02728475]
  [ 0.02858478  0.02691348]
  [ 0.02842172  0.0273287 ]
  [ 0.02865605  0.02686743]
  [ 0.02795038  0.02733723]
  [ 0.0287733   0.02693835]
  [ 0.02838135  0.02726109]
  [ 0.02785174  0.02761061]
  [ 0.03038537  0.02532353]
  [ 0.02849848  0.0269334 ]]]
Correct:
[[44.0, 50.0], [52.0, 51.5], [39.5, 44.5], [42.5, 46.0], [46.5, 48.0], [46.5, 48.0], [53.5, 55.5], [33.5, 53.0], [51.0, 57.0], [45.5, 54.0]]

格式 我将每个元素乘以hight和width以对其进行归一化:

for e in output:
    e[0] *= 640
    e[1] *= 480

编码图像

def format_data(X):
    X = array(X)
    n, h, w, c = X.shape
    X = X.reshape(n, h, w, c).astype("float32")
    X = X / 255 # normalize between 0 and 1
    return X

编码目标数据

def format_output(y, max_box_count=0):
    for e in y:
        if len(e) > max_box_count:
            max_box_count = len(e)

    y_array = []
    for e in y:
        e_zeros = zeros((max_box_count, 2))
        e_zeros[:len(e)] = e
        y_array.append(e_zeros)

    for e in y_array:
        e[0] /= 640
        e[1] /= 480

    return array(y_array).astype('float32'), max_box_count

0 个答案:

没有答案