Question

我正在重新训练一个初始的resnet v2模型来识别一个3位数字序列（特定字体类型）。序列是在白色背景上用黑色字体人工生成的。我认为让模型只看到特定的背景将有助于我消除错误的检测（在这种情况下，任何其他3位数字序列不在白色背景上），因为模型不会预测（高概率）序列的序列背景不是白色的。这是一个有效的假设吗？

PS：我之前尝试过使用tesseract从图像中提取文本。我使用东方文本检测器进行检测，这给了我文本的边界框。我使用pytesseract跟踪OCR，但它总是返回一个空字符串。此外，在旋转数字时，东方文本检测器无法识别旋转的数字序列。因此我没有选择，只能使用神经网络模型训练和执行文本检测和提取。

pytesseract的代码：

import cv2
import numpy as np
import pytesseract
from pytesseract import  image_to_string
from PIL import Image
refPt=[(486,302),(540,308),(538,328),(484,323)] #the bbox returned by east
refpt = np.array(refPt,dtype=np.int32)
roi_corners=np.array(refPt[0:4],np.int32).reshape((-1,1,2))
inp_img=cv2.imread("1.jpg")
mask = np.zeros(inp_img.shape, dtype=np.uint8)
channel_count = inp_img.shape[2]
ignore_mask_color = (255,)*channel_count
mask = cv2.fillPoly(mask,   np.array(refPt[0:4],np.int32).reshape((-1,1,2))], ignore_mask_color)
masked_image = cv2.bitwise_and(inp_img, mask)
print (image_to_string(Image.fromarray(masked_image),lang='eng'))

训练模型偏向于给定的背景

0 个答案: