Tesseract can not recognize captcha text

时间:2019-04-17 00:32:16

标签: python python-3.x opencv tesseract python-tesseract

I am trying to recognize the text in a captcha and it is not possible for me. I am using python3, openCv and tesseract.

The simplified code is:

import cv2                                                           
from pytesseract import *

img_path = "path"

img = cv2.imread(img_path)
img = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_LINEAR)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

pytesseract.image_to_string(img)

I think I should remove the color lines first, then leave the text alone, and maybe change the brightness and contrast. What filter could apply?

These are some images to recognize.

1 个答案:

答案 0 :(得分:0)

要使用pytesseract-ocr识别验证码文本,您可以执行以下操作。

  • 准备自定义train_set来训练您的tesseract实例以识别特定字体 [可选]

  • 验证码图像需要进行一些预处理(例如*应用黑白滤镜>缩放(向上)>模糊>形态转换+自适应阈值*)以增强文本部分并减少噪点/线条。

  • 用于消除线条:在示例图像中,只能看到黑色的文本,而没有黑线,因此您可以使用PIL或OpenCV轻松地将每个非黑色像素转换为白色,甚至可以利用Hough Line Transform等特定算法来检测和删除线。

您可以从OpenCV网站上的官方文档和教程中了解所有这些过滤器和算法。