我正在开发一个应用程序,使用 OpenCV和Tesseract 从图像中提取文本。当图像包含白色文本以及图像混合了白色和其他颜色时,我卡在了一个的地方。
如果文本不是白色,我很容易提取文本,但是如果文本是白色,则它不起作用。我正在使用Tesseract从文本中提取数据。 下面是我传递给tesseract之前的图像处理代码:
image = cv2.imread(imgPath)
filename = getFileName()
img2gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
#cv2.imwrite('python_scripts/temp/img2gray_' + filename, img2gray)
""" below removing shadow from image """
rgb_planes = cv2.split(img2gray)
result_planes = []
result_norm_planes = []
for plane in rgb_planes:
dilated_img = cv2.dilate(plane, np.ones((1, 1), np.uint8))
bg_img = cv2.medianBlur(dilated_img, 21)
diff_img = 255 - cv2.absdiff(plane, bg_img)
norm_img = diff_img
norm_img = cv2.normalize(diff_img, norm_img, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8UC1)
result_planes.append(diff_img)
result_norm_planes.append(norm_img)
result = cv2.merge(result_planes)
result_norm = cv2.merge(result_norm_planes)
img2gray = result
origImage = image.copy()
_, new_img = cv2.threshold(img2gray, 80, 255, cv2.THRESH_BINARY) # for black text , cv.THRESH_BINARY_INV
# to manipulate the orientation of dilution , large x means horizonatally dilating more, large y means vertically dilating more
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (13, 13))
dilated = cv2.dilate(new_img, kernel, iterations=9) # dilate , more the iteration more the dilation
cv2.imwrite('python_scripts/temp/dilated_' + filename, dilated)
_, contours, _ = cv2.findContours(dilated, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE) # get contours
(imgH, imgW) = img2gray.shape[:2]
#logger.info("imgH: " + str(imgH) + ", imgW: " + str(imgW))
ctr = 0
contoursSort = {}
for contour in contours:
# to sort contours from top to bottom
[x, y, w, h] = cv2.boundingRect(contour)
contoursSort[y] = contour
orderedContours = collections.OrderedDict(sorted(contoursSort.items()))
for key, contour in orderedContours.items():
# get rectangle bounding contour
(x, y, w, h) = cv2.boundingRect(contour)
# Don't plot small false positives that aren't text
if w < 35 and h < 35:
continue
length = x + w
breadth = y + h
area = length * breadth
cv2.rectangle(origImage, (x, y), (x + w, y + h), (255, 255, 0), 2)
# don't consider contour which is touching the border
if x != 0 and y != 0 and x != imgW and y != imgH:
#logger.info("x: " + str(x) + ", y: " + str(y) + ", length: " + str(length)
# + ", breadth: " + str(breadth) + ", area: " + str(area))
croppedImg = origImage[y:(y + h), x:(x + w)]
croppedGray = cv2.cvtColor(croppedImg, cv2.COLOR_BGR2GRAY)
ctr = ctr + 1
th3 = cv2.adaptiveThreshold(croppedGray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,75,10)
#th3_height, th3_width = th3.shape[:2]
#th3 = cv2.resize(th3,(2*th3_width, 2*th3_height), interpolation = cv2.INTER_CUBIC)
th3 = cv2.GaussianBlur(th3,(5,5),0)
tessImgPath = 'python_scripts/temp/th3_' + str(ctr) + "_" + filename
cv2.imwrite(tessImgPath, th3)
tessText = runTesseract(tessImgPath)
os.remove(tessImgPath)
logger.info("tess text: " + str(tessText))
请指导我如何通过OpenCV处理图像以便两者兼顾 可以一次性提取白色和其他颜色的文本。
我现在尝试的一种方法是在图像中获得白色部分,如果它小于其他颜色,则执行bitwise_xor
图像,然后将其传递给Tesseract。
提前致谢
答案 0 :(得分:0)
我也在研究文本提取,其中 tesseract 无法提取多种颜色的文本。
白色不是问题。我尝试对您的图像进行提取。 Tesseract 给出了第二张图片的准确输出,因为文本只包含单一颜色。
但是第一张图像的结果并不好,因为一半的文本是黑色的,另一半是白色的,带有灰色轮廓。所以试试这个以获得全白和黑色图像:
fruits["Mango"] = { price : 350 };
因此,在从中提取文本后,它将白色文本的灰色轮廓变为黑色。 最终结果是
import cv2
import numpy as np
# read image
img = cv2.imread("XDJFq.jpg")
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# do adaptive threshold on gray image
thresh = cv2.adaptiveThreshold(gray, 250, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 11)
# write results to disk
cv2.imwrite("Full_white.jpg", thresh)
# display it
cv2.imshow("thresh", thresh)
cv2.waitKey(0)