Question

我在linux环境中安装了tesseract。

当我执行类似

的操作时，它会起作用

# tesseract myPic.jpg /output

但我的照片有一些小标签，而tesseract没有看到它们。

是否可以选择设置音高或类似的东西？

文字标签示例：

有了这张照片，tesseract不承认任何价值...

但是有了这张照片：

我有以下输出：

J8

J7A-J7B P7 \

2
40 50 0 180 190

200

P1 P2 7

110 110
\ l

例如，在这种情况下，tesseract看不到90（左上角）......

我认为这只是一个选择来定义或者想一想，不是吗？

THX

Answer 1

为了从Tesseract（以及任何OCR引擎）获得准确的结果，您需要遵循一些指导原则，我在这篇文章的回答中可以看到： Junk results when using Tesseract OCR and tess-two

以下是它的要点：

使用高分辨率图像（如果需要）300 DPI最小



确保图像中没有阴影或弯曲



如果有任何偏斜，您需要在ocr之前的代码中修复图像



使用字典帮助取得好成绩



调整文字大小（12磅字体是理想的）



将图像二值化并使用图像处理算法消除噪音

建议花一些时间训练OCR引擎以获得更好的结果，如以下链接所示：Training Tesseract

我拍摄了您分享的2张图片，并使用LEADTOOLS SDK（免责声明：我是该公司的员工）对其进行了一些图像处理，并且能够获得比处理过的更好的结果图像，但由于原始图像不是最大的 - 它仍然不是100％。这是我用来尝试修复图像的代码：

//initialize the codecs class
using (RasterCodecs codecs = new RasterCodecs())
{
   //load the file
   using (RasterImage img = codecs.Load(filename))
   {
      //Run the image processing sequence starting by resizing the image
      double newWidth = (img.Width / (double)img.XResolution) * 300;
      double newHeight = (img.Height / (double)img.YResolution) * 300;
      SizeCommand sizeCommand = new SizeCommand((int)newWidth, (int)newHeight, RasterSizeFlags.Resample);
      sizeCommand.Run(img);

      //binarize the image
      AutoBinarizeCommand autoBinarize = new AutoBinarizeCommand();
      autoBinarize.Run(img);

      //change it to 1BPP
      ColorResolutionCommand colorResolution = new ColorResolutionCommand();
      colorResolution.BitsPerPixel = 1;
      colorResolution.Run(img);

      //save the image as PNG
      codecs.Save(img, outputFile, RasterImageFormat.Png, 0);
   }
}

以下是此过程的输出图像：

tesseract没有获得小标签

1 个答案: