我有一部分包含数字的图像。 Tesseract读取OCR的每个子集。不幸的是,对于某些图像,从原始图像进行裁剪并不是最佳选择。
在图像的顶部和底部保留一些伪像/残留物,并妨碍Tesseract识别图像上的字符。然后,我想摆脱这些工件,并得到类似的结果:
首先,我考虑了一种简单的方法:我将像素的第一行设置为参考:如果在x轴上找到了伪像(即,如果图像被二值化,则为白色像素),则沿y删除了它。直到下一个黑色像素。这种方法的代码如下:
import cv2
inp = cv2.imread("testing_file.tif")
inp = cv2.cvtColor(inp, cv2.COLOR_BGR2GRAY)
_,inp = cv2.threshold(inp, 150, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
ax = inp.shape[1]
ay = inp.shape[0]
out = inp.copy()
for i in range(ax):
j = 0
while j in range(ay):
if out[j,i] == 255:
out[j,i] = 0
else:
break
j+=1
out = cv2.bitwise_not(out)
cv2.imwrite('output.png',out)
但是结果一点也不好:
然后,我偶然发现了scipy(here)的Flood_fill函数,但发现它既浪费时间又效率不高。在SO here上提出了类似的问题,但并没有太大帮助。也许可以考虑采用k最近邻方法?我还发现,将在某些条件下合并相邻像素的方法称为增长方法,其中最常见的是单链接(here)。
您建议删除上下工件的什么?
答案 0 :(得分:2)
这是一种简单的方法:
转换为灰度后,我们以Otsu的阈值获取二进制图像
# Read in image, convert to grayscale, and Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
接下来,我们创建一个较长的水平核并进行膨胀以将数字连接在一起
# Create special horizontal kernel and dilate
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (70,1))
dilate = cv2.dilate(thresh, horizontal_kernel, iterations=1)
从这里我们检测到水平线并分类为最大轮廓。这个想法是最大的轮廓将是数字的中间部分,其中的数字都是“完整的”。任何较小的轮廓将是部分或截断的数字,因此我们在此处将其过滤掉。我们将最大的轮廓绘制到蒙版上
# Detect horizontal lines, sort for largest contour, and draw on mask
mask = np.zeros(image.shape, dtype=np.uint8)
detected_lines = cv2.morphologyEx(dilate, cv2.MORPH_OPEN, horizontal_kernel, iterations=1)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
for c in cnts:
cv2.drawContours(mask, [c], -1, (255,255,255), -1)
break
现在我们有了所需数字的轮廓,我们仅需按位操作,并使用原始图像并将背景涂成白色即可得到结果
# Bitwise-and to get result and color background white
mask = cv2.cvtColor(mask,cv2.COLOR_BGR2GRAY)
result = cv2.bitwise_and(image,image,mask=mask)
result[mask==0] = (255,255,255)
完整代码完整
import cv2
import numpy as np
# Read in image, convert to grayscale, and Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Create special horizontal kernel and dilate
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (70,1))
dilate = cv2.dilate(thresh, horizontal_kernel, iterations=1)
# Detect horizontal lines, sort for largest contour, and draw on mask
mask = np.zeros(image.shape, dtype=np.uint8)
detected_lines = cv2.morphologyEx(dilate, cv2.MORPH_OPEN, horizontal_kernel, iterations=1)
cnts = cv2.findContours(detected_lines, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
for c in cnts:
cv2.drawContours(mask, [c], -1, (255,255,255), -1)
break
# Bitwise-and to get result and color background white
mask = cv2.cvtColor(mask,cv2.COLOR_BGR2GRAY)
result = cv2.bitwise_and(image,image,mask=mask)
result[mask==0] = (255,255,255)
cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('result', result)
cv2.waitKey()