不均匀空间数字的字符分割和识别

时间:2017-11-13 20:54:37

标签: python image opencv image-segmentation

我有一个数字图像,如下所示 Image
我使用自适应阈值处理方法将上面的数字分割成它的数字并检测轮廓,并将边界矩形的高度和重量限制设置为大于15以获得以下分段数字。
enter image description here enter image description here enter image description here enter image description here enter image description here

而不是上面的输出,我想分割上面图像中的数字,以便分别得到每个数字。调整大小到(28,28)之后的结果可以被馈送到MNIST的CNN以更好地预测特定数字。
So, is there any other neat way of segmenting this number in image into individual digits?

提到的一种方法here建议滑动固定大小的绿色窗口并通过训练神经网络来检测数字。那么,这个NN将如何训练来对数字进行分类?这种方法避免了OpenCV方法来分离每个单独的数字,但只是在整个图像上滑动窗口不会有点贵。如何在训练时处理正面和负面的例子(我应该创建一个单独的数据集......正面的例子可以是mnist数字但是负面例子呢。)

分割:

img = cv2.imread('Image')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

blur = cv2.GaussianBlur(gray,(3,3), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_MEAN_C,\
            cv2.THRESH_BINARY_INV, 7,10)
thresh = clear_border(thresh)

# find contours in the thresholded image, then initialize the
# list of group locations
clone = np.dstack([gray.copy()] * 3)
groupCnts = cv2.findContours(thresh.copy(), cv2.RETR_TREE,
    cv2.CHAIN_APPROX_SIMPLE)
groupCnts = groupCnts[0] if imutils.is_cv2() else groupCnts[1]
groupLocs = []

clone = np.dstack([gray.copy()] * 3)
# loop over the group contours
for (i, c) in enumerate(groupCnts):
    # compute the bounding box of the contour
    (x, y, w, h) = cv2.boundingRect(c)
    # only accept the contour region as a grouping of characters if
    # the ROI is sufficiently large
    if w >= 15 and h >= 15:
        print (i, (x, y, w, h))
        cv2.rectangle(clone, (x,y), (x+w, y+h), (255,0,0), 1)
        groupLocs.append((x, y, w, h))

滑动窗口:

clf = joblib.load("digits_cls.pkl")    #mnist trained classifier
img = cv2.imread('Image', 0)
winW, winH = (22, 40)
cv2.imshow("Window0", img)
cv2.waitKey(1)

blur = cv2.GaussianBlur(img, (5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
            cv2.THRESH_BINARY,11,2)
thresh = clear_border(thresh) 

for (x, y, window) in sliding_window(img, stepSize=10, windowSize=(winW, winH)):
    if (window.shape[0] != winH or window.shape[1] != winW):
        continue
    clone = img.copy()
    roi = thresh[y:y+winH, x:x+winW]
    roi = cv2.resize(roi, (28, 28), interpolation=cv2.INTER_AREA)
    roi = cv2.dilate(roi, (3, 3))
    cv2.imshow("Window1", roi)
    cv2.waitKey(1)
    roi_hog_fd = hog(roi, orientations=9, pixels_per_cell=(14, 14), cells_per_block=(1, 1), visualise=False)
    nbr = clf.predict(np.array([roi_hog_fd], 'float64'))
    print (nbr)

    # since we do not have a classifier, we'll just draw the window
    clone = img.copy()
    cv2.rectangle(clone, (x, y), (x + winW, y + winH), (0, 255, 0), 2)
    cv2.imshow("Window2", clone)
    cv2.waitKey(1)
    time.sleep(0.95)

奇怪的输出(即使是它预测的空白窗口):522637753787357777722

分隔联接数字:

 h,w = img.shape[:2]
 count = 0
 iw = 15
 dw = w
 sw, sh = int(0), int(0)
 while (dw > 0):
    new_img = img[:, sw:(count+1)*iw]
    dw = dw - iw
    sw = sw + iw
    if (dw-iw < 0):
        iw = w
    new = os.path.join('amount/', 'amount_'+ str(count)+'.png')
    cv2.imwrite(new, new_img)

输出:
enter image description here - &gt; enter image description here enter image description here enter image description here
enter image description here - &gt; enter image description here enter image description here

找到了一种分离这些连接数字并将它们送到训练有素的分类器的方法,输出仍然不准确。

我使用的步骤:
(i)提取第一张图像
(ii)将第一图像分割成单独的图像,即获得第二图像 (iii)查看图像宽度是否超​​过某个阈值,如果是,则将其进一步分段以产生单独的数字(如果是上面的连接数字)
(iv)将步骤3之后获得的所有单独数字馈送到mnist分类器,以根据重新成像的图像获得数字预测。
Lengthy right?
Is there any other efficient way to convert first image to digits directly (yes I used pytesseract too!!)?

1 个答案:

答案 0 :(得分:3)

如果您有足够的时间和资源,培训新的神经网络将是一个优雅的解决方案。

要单独分隔每个数字,您可以尝试反转图像的强度,使手写为白色,背景为黑色。然后水平投影值(水平对所有像素值求和)并查找峰值。每个峰值位置都应指明一个新的角色位置。

投影图上的额外平滑功能应优化角色位置。