我正在尝试训练一个基本的卷积网络以定位图像中的对象。我将图像转换为一个形状为480, 640, 3
(高,宽,RGB)的numpy数组。目标是带有盒子点的2D numpy数组。它的形状是18, 2
(max_boxes,(x,y))。
我将其输入一个相当简单的卷积模型(如果有帮助,我可以发布源代码),该模型擅长对图像进行分类。
当我训练模型时,它很快达到了约0.77的精度,损失高达〜35。此后,它停止学习,并且当我尝试测试图像时,它表现超级好。
这是为什么?我是否对目标数据进行了错误编码,或者在训练网络时出现问题?这样做的“标准”方法是什么?我该如何解决?
示例输出
Raw output:
[[[0.02862035 0.02726501]
[0.02820634 0.02730229]
[0.02834299 0.02714084]
[0.02819964 0.02701551]
[0.02831725 0.02733742]
[0.02820959 0.02748821]
[0.02811676 0.02731069]
[0.0284538 0.0269504 ]
[0.02842114 0.02728475]
[0.02858478 0.02691348]
[0.02842172 0.0273287 ]
[0.02865605 0.02686743]
[0.02795038 0.02733723]
[0.0287733 0.02693835]
[0.02838135 0.02726109]
[0.02785174 0.02761061]
[0.03038537 0.02532353]
[0.02849848 0.0269334 ]]]
Formatted:
[[[18.317022 17.44961 ]
[13.539045 13.105101 ]
[ 0.02834299 0.02714084]
[ 0.02819964 0.02701551]
[ 0.02831725 0.02733742]
[ 0.02820959 0.02748821]
[ 0.02811676 0.02731069]
[ 0.0284538 0.0269504 ]
[ 0.02842114 0.02728475]
[ 0.02858478 0.02691348]
[ 0.02842172 0.0273287 ]
[ 0.02865605 0.02686743]
[ 0.02795038 0.02733723]
[ 0.0287733 0.02693835]
[ 0.02838135 0.02726109]
[ 0.02785174 0.02761061]
[ 0.03038537 0.02532353]
[ 0.02849848 0.0269334 ]]]
Correct:
[[44.0, 50.0], [52.0, 51.5], [39.5, 44.5], [42.5, 46.0], [46.5, 48.0], [46.5, 48.0], [53.5, 55.5], [33.5, 53.0], [51.0, 57.0], [45.5, 54.0]]
格式 我将每个元素乘以hight和width以对其进行归一化:
for e in output:
e[0] *= 640
e[1] *= 480
编码图像
def format_data(X):
X = array(X)
n, h, w, c = X.shape
X = X.reshape(n, h, w, c).astype("float32")
X = X / 255 # normalize between 0 and 1
return X
编码目标数据
def format_output(y, max_box_count=0):
for e in y:
if len(e) > max_box_count:
max_box_count = len(e)
y_array = []
for e in y:
e_zeros = zeros((max_box_count, 2))
e_zeros[:len(e)] = e
y_array.append(e_zeros)
for e in y_array:
e[0] /= 640
e[1] /= 480
return array(y_array).astype('float32'), max_box_count