Tesseract没有返回一致的结果

时间:2016-06-29 16:27:13

标签: ocr tesseract

问题。

我想屏幕上一个流行的智能手机游戏,以便从Android VM图像上运行的游戏实例中获取Gold,Elixir和Dark Elixir值。

然而,tesseract成功标记了一些样本,但拒绝成功标记其他样本。使用在线OCR测试相同的样本会返回正结果。

我使用标准的英语训练数据,并训练Tesseract使 Supercell-Magic 字体提高准确度约30%。

样品

gold_sample_1

gold_sample_1.png

gold_sample_1_processed

magick gold_sample_1.png -fill Black +opaque "#fffbcc" -fill White -opaque "#fffbcc" gold_sample_1_processed.png

gold_sample_1_processed.png

输出

40 494

gold_sample_3

enter image description here

gold_sample_3_processed

magick gold_sample_3.png -fill Black +opaque "#ffffff" -fill White -opaque "#ffffff" gold_sample_3_processed.png

enter image description here

输出

There is nothing in the output file

然而,将相同内容上传到online OCR会给我这样的信息:

enter image description here

功能

OS。

Windows 7 x64 SP1 

我的Win7仍然没有像世界上许多人一样升级自己的忍者风格;)

Tesseract OCR。

tesseract 3.05.00dev
leptonica-1.73 (Feb  5 2016, 01:13:58) [MSC v.1900 LIB Release x86]
libgif 5.1.2 : libjpeg 9 : libpng 1.6.19 : libtiff 4.0.2 : zlib 1.2.8 : libwebp 0.3.1. 

Image Magick。

Version: ImageMagick 7.0.2-1 Q8 x86 2016-06-23 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2015 ImageMagick Studio LLC
License: http://www.imagemagick.org/script/license.php
Visual C++: 180040629
Features: Cipher DPC Modules OpenMP
Delegates (built-in): bzlib cairo flif freetype jng jp2 jpeg lcms lqr openexr pangocairo png ps rsvg tiff webp xml zlib

1 个答案:

答案 0 :(得分:0)

<强>解决!

明确指定psm模式。

tesseract --help-psm
Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.

图片:

enter image description here

和命令:

tesseract gold_sample_3_processed.png sample3 -l eng2 -psm 8

给出输出:

954193

无论如何感谢网络陌生人。