Pytesseract OCR在验证码图像上未返回正确的结果

时间:2019-09-13 16:12:28

标签: python opencv ocr tesseract python-tesseract

我正在使用tesseract 4.0.0-beta.1

我有以下图片

ocr image

我已经使用opencv转换了这张图片

converted image

img = cv2.imread(image, cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, None, fx=5, fy=5, interpolation=cv2.INTER_LINEAR)
img = cv2.medianBlur(img, 9)
_, img = cv2.threshold(img, 185, 255, cv2.THRESH_BINARY)

我的tesseract奖

tesseract image.png stdout -l eng-best --oem 1 --psm 7

获取结果: NVRG nk

但结果应为: nvRGnk

1 个答案:

答案 0 :(得分:0)

从转换后的图像开始,只需要多一点过滤 enter image description here

  

nvRGnk

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('2.png',0)
image = 255 - image
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
close = cv2.morphologyEx(image, cv2.MORPH_CLOSE, kernel, iterations=2)
dilate_kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2,2))
dilate = cv2.dilate(close, dilate_kernel, iterations=1)
result = 255 - dilate 

data = pytesseract.image_to_string(result, lang='eng',config='--psm 13')
print(data)

cv2.imshow('result', result)
cv2.waitKey()