无法识别数字5

时间:2018-10-27 05:49:32

标签: python opencv ocr tesseract python-tesseract

我正在使用Pytesseract识别5号图像,但我惊讶地发现,即使应用了GlaussianBlur和Threshold之类的各种滤镜并应用了膨胀和腐蚀来消除噪声,它仍然无法识别图像。 / p>

尝试过的过滤器:

        1: cv2.threshold(cv2.GaussianBlur(img, (9, 9), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1],
        2: cv2.threshold(cv2.GaussianBlur(img, (7, 7), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1],
        3: cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1],
        4: cv2.threshold(cv2.medianBlur(img, 5), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1],
        5: cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1],
        6: cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2),
        7: cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2),

应用膨胀和腐蚀以去除一些噪音

    kernel = np.ones((1, 1), np.uint8)
    img = cv2.dilate(img, kernel, iterations=1)
    img = cv2.erode(img, kernel, iterations=1)

培训数据:

eng.traineddata

原始图片

enter image description here

不确定这里出了什么问题,我阅读了Tesseract的文档,并应用了此处提到的所有预处理步骤。有人可以帮我解决这里的问题吗

1 个答案:

答案 0 :(得分:0)

尝试checkpoint

--psm 10

这应该产生import pytesseract from PIL import Image import requests import io response = requests.get('https://i.stack.imgur.com/ZcPqGs.jpg') text = pytesseract.image_to_string(Image.open(io.BytesIO(response.content)), lang='eng', config='--psm 10') print(text)