I am trying to recognize the text in a captcha and it is not possible for me. I am using python3, openCv and tesseract.
The simplified code is:
import cv2
from pytesseract import *
img_path = "path"
img = cv2.imread(img_path)
img = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_LINEAR)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
pytesseract.image_to_string(img)
I think I should remove the color lines first, then leave the text alone, and maybe change the brightness and contrast. What filter could apply?
These are some images to recognize.
答案 0 :(得分:0)
要使用pytesseract-ocr识别验证码文本,您可以执行以下操作。
准备自定义train_set来训练您的tesseract实例以识别特定字体 [可选]
验证码图像需要进行一些预处理(例如*应用黑白滤镜>缩放(向上)>模糊>形态转换+自适应阈值*)以增强文本部分并减少噪点/线条。
用于消除线条:在示例图像中,只能看到黑色的文本,而没有黑线,因此您可以使用PIL或OpenCV轻松地将每个非黑色像素转换为白色,甚至可以利用Hough Line Transform等特定算法来检测和删除线。
您可以从OpenCV网站上的官方文档和教程中了解所有这些过滤器和算法。