Question

我正在开发一个应用程序，使用 OpenCV和Tesseract 从图像中提取文本。当图像包含白色文本以及图像混合了白色和其他颜色时，我卡在了一个的地方。

如果文本不是白色，我很容易提取文本，但是如果文本是白色，则它不起作用。我正在使用Tesseract从文本中提取数据。下面是我传递给tesseract之前的图像处理代码：

image = cv2.imread(imgPath) filename = getFileName() img2gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) #cv2.imwrite('python_scripts/temp/img2gray_' + filename, img2gray) """ below removing shadow from image """ rgb_planes = cv2.split(img2gray) result_planes = [] result_norm_planes = [] for plane in rgb_planes: dilated_img = cv2.dilate(plane, np.ones((1, 1), np.uint8)) bg_img = cv2.medianBlur(dilated_img, 21) diff_img = 255 - cv2.absdiff(plane, bg_img) norm_img = diff_img norm_img = cv2.normalize(diff_img, norm_img, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8UC1) result_planes.append(diff_img) result_norm_planes.append(norm_img) result = cv2.merge(result_planes) result_norm = cv2.merge(result_norm_planes) img2gray = result origImage = image.copy() _, new_img = cv2.threshold(img2gray, 80, 255, cv2.THRESH_BINARY) # for black text , cv.THRESH_BINARY_INV # to manipulate the orientation of dilution , large x means horizonatally dilating more, large y means vertically dilating more kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (13, 13)) dilated = cv2.dilate(new_img, kernel, iterations=9) # dilate , more the iteration more the dilation cv2.imwrite('python_scripts/temp/dilated_' + filename, dilated) _, contours, _ = cv2.findContours(dilated, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE) # get contours (imgH, imgW) = img2gray.shape[:2] #logger.info("imgH: " + str(imgH) + ", imgW: " + str(imgW)) ctr = 0 contoursSort = {} for contour in contours: # to sort contours from top to bottom [x, y, w, h] = cv2.boundingRect(contour) contoursSort[y] = contour orderedContours = collections.OrderedDict(sorted(contoursSort.items())) for key, contour in orderedContours.items(): # get rectangle bounding contour (x, y, w, h) = cv2.boundingRect(contour) # Don't plot small false positives that aren't text if w < 35 and h < 35: continue length = x + w breadth = y + h area = length * breadth cv2.rectangle(origImage, (x, y), (x + w, y + h), (255, 255, 0), 2) # don't consider contour which is touching the border if x != 0 and y != 0 and x != imgW and y != imgH: #logger.info("x: " + str(x) + ", y: " + str(y) + ", length: " + str(length) # + ", breadth: " + str(breadth) + ", area: " + str(area)) croppedImg = origImage[y:(y + h), x:(x + w)] croppedGray = cv2.cvtColor(croppedImg, cv2.COLOR_BGR2GRAY) ctr = ctr + 1 th3 = cv2.adaptiveThreshold(croppedGray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,75,10) #th3_height, th3_width = th3.shape[:2] #th3 = cv2.resize(th3,(2*th3_width, 2*th3_height), interpolation = cv2.INTER_CUBIC) th3 = cv2.GaussianBlur(th3,(5,5),0) tessImgPath = 'python_scripts/temp/th3_' + str(ctr) + "_" + filename cv2.imwrite(tessImgPath, th3) tessText = runTesseract(tessImgPath) os.remove(tessImgPath) logger.info("tess text: " + str(tessText))

请指导我如何通过OpenCV处理图像以便两者兼顾可以一次性提取白色和其他颜色的文本。

我现在尝试的一种方法是在图像中获得白色部分，如果它小于其他颜色，则执行bitwise_xor图像，然后将其传递给Tesseract。

提前致谢

Answer 1

我也在研究文本提取，其中 tesseract 无法提取多种颜色的文本。

白色不是问题。我尝试对您的图像进行提取。 Tesseract 给出了第二张图片的准确输出，因为文本只包含单一颜色。

但是第一张图像的结果并不好，因为一半的文本是黑色的，另一半是白色的，带有灰色轮廓。所以试试这个以获得全白和黑色图像：

fruits["Mango"] = { price : 350 };

因此，在从中提取文本后，它将白色文本的灰色轮廓变为黑色。最终结果是

import cv2
import numpy as np

# read image
img = cv2.imread("XDJFq.jpg")

# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# do adaptive threshold on gray image
thresh = cv2.adaptiveThreshold(gray, 250, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 11)

# write results to disk
cv2.imwrite("Full_white.jpg", thresh)

# display it
cv2.imshow("thresh", thresh)
cv2.waitKey(0)

无法在tesseract中提取白色文本

1 个答案: