Question

我正在尝试在Ubuntu 14.04上训练tesseract 3.02。我遵循了Cedric's blog中提到的指南。

首先，我尝试使用以下命令生成一个盒子文件：

tesseract eng.mr.exp0.jpg eng.mr.exp0 batch.nochop makebox

但是上面的命令生成一个单行的框文件，整个图像作为一个单独的框（实际上它应该生成一个包含6行的框文件）。所以，我使用jTessBoxEditor来编辑盒子文件并用适当的坐标和字符创建6个盒子。现在，当我尝试使用命令

训练带有上述创建的盒子文件的tesseract时

tesseract eng.mr.exp0.jpg eng.mr.exp0.box nobatch box.train

我收到错误：

Tesseract Open Source OCR Engine v3.03 with Leptonica
FAIL!
APPLY_BOXES: boxfile line 1/0 ((20,24),(95,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 2/7 ((96,24),(171,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 3/0 ((172,24),(248,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 4/3 ((248,24),(324,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 5/3 ((324,24),(400,192)): FAILURE! Couldn't find a matching blob
FAIL!
APPLY_BOXES: boxfile line 6/0 ((400,24),(476,192)): FAILURE! Couldn't find a matching blob
APPLY_BOXES:
   Boxes read from boxfile:       6
   Boxes failed resegmentation:       6
APPLY_BOXES: Unlabelled word at :Bounding box=(0,19)->(480,192)
   Found 0 good blobs.
   1 remaining unlabelled words deleted.
Generated training data for 0 words

我犯的错误是什么？

使用的图片是here

Answer 1

那张照片很脏！在任何OCR软件识别或训练之前，您需要先清理它。预处理完图片后，您可以使用此Tesseract论坛post中的mirc.traineddata。

tesseract将整个图像识别为单个框？

1 个答案: