Question

我正在尝试删除此验证码图片上的行，因此我可以使用tesseract之类的OCR工具读取图片上的字符串文本。我使用How I developed a captcha cracker for my University's website和“图像增强”中已说明的代码，因此可以更好地识别图像。到目前为止，这是我的python代码（实际上我是python的新手）

from PIL import Image, ImageEnhance


im = Image.open("img2.png")
nx, ny = im.size
image = im.resize((int(nx*5), int(ny*5)), Image.BICUBIC)
image.save("img1_enchance.png")

image = image.convert("L") # Grayscale conversion
width, height = image.size
cropped_image = image.crop((0, 0, (460/3), 200))
cropped_image.save("img1_crop.png")

pixel_matrix = cropped_image.load()
croppedwidth, croppedheight = cropped_image.size
for col in range(0, croppedheight): # Height
    for row in range(0, croppedwidth): # Width
        if pixel_matrix[row, col] != 0:
            pixel_matrix[row, col] = 255
cropped_image.save("img1_text1.png")

for column in range(1, croppedheight - 1):
    for row in range(1, croppedwidth - 1):
        if pixel_matrix[row, column] == 0 \
            and pixel_matrix[row, column - 1] == 255 and pixel_matrix[row, column + 1] == 255:
            pixel_matrix[row, column] = 255
        if pixel_matrix[row, column] == 0 \
            and pixel_matrix[row - 1, column] == 255 and pixel_matrix[row + 1, column] == 255:
            pixel_matrix[row, column] = 255
cropped_image.save("img1_text2.png")

问题是，而是得到了文本字符串，我得到了如下图所示的嘈杂行：

（img1_text1.png）

（img1_text2.png）

我在google-drive link中有如下收集的验证码图像：

非常感谢任何帮助，非常感谢

使用Python清除验证码图像的线噪声

0 个答案: