tesseract将整个图像识别为单个框?

时间:2014-06-20 07:30:19

标签: tesseract

我正在尝试在Ubuntu 14.04上训练tesseract 3.02。我遵循了Cedric's blog中提到的指南。

首先,我尝试使用以下命令生成一个盒子文件:

tesseract eng.mr.exp0.jpg eng.mr.exp0 batch.nochop makebox

但是上面的命令生成一个单行的框文件,整个图像作为一个单独的框(实际上它应该生成一个包含6行的框文件)。所以,我使用jTessBoxEditor来编辑盒子文件并用适当的坐标和字符创建6个盒子。现在,当我尝试使用命令

训练带有上述创建的盒子文件的tesseract时
tesseract eng.mr.exp0.jpg eng.mr.exp0.box nobatch box.train

我收到错误:

Tesseract Open Source OCR Engine v3.03 with Leptonica
FAIL!
APPLY_BOXES: boxfile line 1/0 ((20,24),(95,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 2/7 ((96,24),(171,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 3/0 ((172,24),(248,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 4/3 ((248,24),(324,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 5/3 ((324,24),(400,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 6/0 ((400,24),(476,192)): FAILURE! Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:       6
   Boxes failed resegmentation:       6
APPLY_BOXES: Unlabelled word at :Bounding box=(0,19)->(480,192)
   Found 0 good blobs.
   1 remaining unlabelled words deleted.
Generated training data for 0 words

我犯的错误是什么?

使用的图片是here

1 个答案:

答案 0 :(得分:0)

那张照片很脏!在任何OCR软件识别或训练之前,您需要先清理它。预处理完图片后,您可以使用此Tesseract论坛post中的mirc.traineddata