Question

我在使用tesseract和openCV（.net / C＃）对该图像进行OCR时遇到了一些麻烦

PasswordBox：

起初，只有空白行带有白色图像，但是随后我应用了灰度并尝试反转颜色。结果更好。

然后我将openCV方法用于SmoothGaussian和SmoothMedian（仅将1作为参数），虽然更好，但仍不能识别所有字符：

n IHHIEI 5

4 7

可以识别一些数字。

我还尝试过分成三行（30像素高），结果较差（这只是第一行）...

[9| I2l l‘ll

我看到I和L之间的9，即2，但此行中的1不是：l’ll

我使用的代码是下一个：

   try
    {
        Bitmap imageBMP = ConvertToBitmap(cheminFichier.ToString());

    //option par défaut pour les couleurs ?

    imageBMP = GrayScale(imageBMP);

    if (isGreyScale == true)
    {
        for (int y = 0; (y <= (imageBMP.Height - 1)); y++)
        {
            for (int x = 0; (x <= (imageBMP.Width - 1)); x++)
            {
                Color inv = imageBMP.GetPixel(x, y);
                inv = Color.FromArgb(255, (255 - inv.R), (255 - inv.G), (255 - inv.B));
                imageBMP.SetPixel(x, y, inv);
            }
        }
    }

    imageBMP.Save(tempDir.ToString() + "\\img.png", System.Drawing.Imaging.ImageFormat.Png);

    Image<Gray, Byte> temp = new Image<Gray, Byte>((Bitmap)Bitmap.FromFile((tempDir.ToString() + "\\img.png")));
    //Image<Gray, Byte> temp = new Image<Gray, Byte>((Bitmap)Bitmap.Fr);
    temp = temp.SmoothGaussian(1);
    temp = temp.SmoothMedian(1);
    temp = temp.SmoothBlur(240, 90);


    //these are how i separate the result in 3 images
    Bitmap img1 = cropAtRect(temp.Bitmap, new Rectangle(0, 0, 240, 30));
    img1.Save(tempDir.ToString() + "\\1.png");

    Bitmap img2 = cropAtRect(temp.Bitmap, new Rectangle(0, 30, 240, 30));
    img2.Save(tempDir.ToString() + "\\2.png");

    Bitmap img3 = cropAtRect(temp.Bitmap, new Rectangle(0, 60, 240, 30));
    img3.Save(tempDir.ToString() + "\\3.png");


    }
    catch (Exception ex)
    {
        Console.WriteLine("erreur pendant la conversion/copie du fichier image(s)." +         ex.ToString());
    }

所有这些png都存储在一个临时目录中，然后我读取并转换该目录中的每个png并连接文本

 using (var engine = new TesseractEngine(@"./tessdata", "fra", EngineMode.Default))
            {
                //Console.ReadLine();
                string fichierDeSortie = AppDomain.CurrentDomain.BaseDirectory + "output.txt";
                string retourFichier = "";
                foreach (FileInfo fichier in infoDir.GetFiles())
                {

                    if (fichier.Extension.ToString().ToUpper() == ".PNG")
                    {


                        string actualFilePAth = fichier.FullName.ToString();
                        Bitmap bmp = (Bitmap)Bitmap.FromFile(actualFilePAth);
                        //on peut desactiver le bitmap avec l'option -B

                        //convertir en pix to image
                        //System.Drawing.Bitmap tmppage = new System.Drawing.Bitmap(System.Drawing.Bitmap.FromFile(actualFilePAth));

                        if (BitmapActive == true)
                        {
                            Console.WriteLine("\r\n Bitmap Activé !\r\n");
                            using (var page = engine.Process(bmp))
                            {
                                var text = page.GetText();
                                retourFichier = retourFichier + text.ToString();
                                //Console.WriteLine(text.ToString());
                            }
                        }
                        else
                        {
                            Tesseract.Pix pix = Tesseract.PixConverter.ToPix(bmp);
                            using (var page = engine.Process(pix))
                            {
                                var text = page.GetText();
                                retourFichier = retourFichier + text.ToString();
                                //Console.WriteLine(text.ToString());
                            }
                        }
                            Console.WriteLine("decodé : " + fichier.FullName.ToString());
                        System.IO.File.WriteAllText(fichierDeSortie, retourFichier);
                        //fichier.Delete();
                    }

                }
            }                  
            Console.WriteLine("Conversion du pdf en png OK.");

我注意到一些无法识别的数字周围的“压缩标记”（压缩JPG时出现小的像素化气泡）。他们将永远是数字。

我该怎么做才能获得更好的结果？我在圈子里奔波，试图用openCV做一些事情，但是运气不佳...我每天转换的很多pdf都可以看到tesseract。

预先感谢，IVAN

Answer 1

Tesseract对噪声非常敏感，因此您必须尽可能地清理图像。以下代码是Python语言，但是您可以轻松地将其移植到C＃。

import numpy as np
import cv2
import pytesseract

if __name__ == '__main__':

    image = cv2.imread('image.jpg', cv2.IMREAD_UNCHANGED);

    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

    ret,binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY|cv2.THRESH_OTSU)

    mask = cv2.bitwise_not(binary)

    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
    mask = cv2.morphologyEx(mask, cv2.MORPH_DILATE, kernel, iterations = 3)

    (rows,cols) = image.shape[:2]

    contours = cv2.findContours(mask, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)[1]

    for cnt in contours:
        (x,y,w,h) = cv2.boundingRect(cnt)
        if w < cols/4 and h < rows/4:
            cv2.drawContours(mask, [cnt], -1, 0, -1)

    binary[mask == 255] = 255

    text = pytesseract.image_to_string(binary, lang='eng',config='--tessdata-dir /tesseract/tessdata/data --psm 6 --oem 2')
    print(text)

Answer 2

如果正确使用OpenCV + Tesseract，它可能会非常强大。您要达到的目标可以通过以下方式进行总结。

OpenCV

使图像灰度化
在灰度图像上应用阈值
反转图像，使文本为白色，背景为黑色
- 如果您尝试检测单个单词而不是字符，则使用结构化元素（水平间距1）来扩展倒置垫子
findContours（mat，contours，structuring element，anchor）
对于每个轮廓，获取一个Rect rect = boundingRect（countour [i]），现在您可以找到所需的文本/像素分组的位置了
您可以执行mat（rect）提取特定图像部分

Tesseract

首先最好从github下载eng.traineddata
如果您使用特定的字符集，则一旦启动tesseract，说只有数字，您就有两个变量tessedit_char_blacklist（强制执行，因此所有字母和特殊字符都应在此处），tessedit_char_whitelist（建议，请在此处输入数值）
对齐方式也很重要，因此，如果可能的话，将所有应排成一行的文本剪裁并使其成为线性垫（hconcat（）会有所帮助）。在您进行OCR之前。

我使用C ++进行开发，因此您需要找出与首选语言等效的功能。

使用Tesseract和OpenCV C＃.NET对数字进行严重压缩的JPG进行OCR

2 个答案: