我训练了一个CNN模型,其准确度接近0.93。已使用 The Street View House Numbers (SVHN) Dataset进行了训练,该{{3}}由10个类别组成,代表10个数字(0-9)
我使用的数据集由32x32张图片组成,这些图片的大小调整为227x227,这是设计模型的形状。要检测边界框,我是:
这是我的操作方式:
def pyramid(image):
pyr = [image]
pd = image
for i in range(4):
pdown = cv2.pyrDown(pd)
pd = pdown
pyr.append(pdown)
return pyr
window = (48,48)
stride = (4,4)
def slider():
for row in range(0, img.shape[0], stride[1]):
for col in range(0, img.shape[1], stride[0]):
# yield the current window
yield (col, row, img[row:row+window[0], col:col+window[1]])
方法pyramid
最多生成4个金字塔。方法滑块以4x4的步幅移动滑动窗口48x48。这是我使用它们进行预测的方式:
l = 1
loaded_model = keras.models.load_model("model.hdf5")
for resized in pyramid(img):
img = resized
i = 0
for (x, y, win) in slider():
i = i + 1
if win.shape[0] != window[0] and win.shape[1] != window[1]:
continue
winc = np.copy(win)
win = cv2.resize(win, (227,227))
win = win.astype('float32') / 255.
proba = loaded_model.predict_proba(np.expand_dims(win, axis = 0))
_class = loaded_model.predict_classes(np.expand_dims(win, axis = 0))
#print("Max: ", np.max(proba))
if np.max(proba) >= 0.93:
filename = "win/{}-{}-{}.png".format(str(l), _class[0], str(round(np.max(proba), 2)))
cv2.imwrite(filename, winc)
l = l + 1
print(resized.shape)
我过滤了图像,其中检测到的类别的概率大于0.93,但仍然会生成很多类别。我不确定如何使用此方法生成边界框。您有什么方法可以建议如何使用我训练有素的CNN获得边界框吗?