Question

我想识别NumberPlate的字符。如何在Ubuntu 16.04中训练相应车牌的tesseract-ocr。由于我不熟悉培训。请帮助我创建一个“ traineddata”文件来识别车牌。

[sample Number plate for which i want to detect character][1]

[sample Number plate for which i want to detect character.][2]

I have 1000 images of number plate.

请调查一下。任何帮助将不胜感激。

所以我尝试了以下命令

tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox

tesseract eng.arial.plate3655.png eng.arial.plate3655 batch.nochop makebox

但是它给出了错误。

Tesseract Open Source OCR Engine v4.1.0-rc1-56-g7fbd with Leptonica
Error, cannot read input file eng.arial.plate3655.png: No such file or directory
Error during processing.

之后我尝试了

tesseract plate4.png eng.arial.plate4 batch.nochop makebox

它可以工作，但是在某些盘子里。现在在步骤2中。我遇到了错误。

屏幕截图已附上。

[Plate 4 image for training ][3]

[Step 1 and Step2 display in terminal][4]

[File Generated after step 1 and step 2][5]


[Content of file generated after step 1 and step 2][6]




  [1]: https://i.stack.imgur.com/6raR2.png
  [2]: https://i.stack.imgur.com/WuGE6.jpg
  [3]: https://i.stack.imgur.com/BPwDj.png
  [4]: https://i.stack.imgur.com/K6KEu.png
  [5]: https://i.stack.imgur.com/yrjEd.png
  [6]: https://i.stack.imgur.com/wlXFT.png

Answer 1

为Tesseract 4创建.trained数据

{*注意：安装tesseract之后，打开cmd并执行以下操作。}

步骤1：为要训练的图像制作框文件

语法：

tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] batch.nochop makebox

例如：

tesseract own.arial.exp0.jpg own.arial.exp0 batch.nochop makebox

{*注意：制作Box文件后，我们必须更改或修改Box文件中错误识别的字符。}

步骤2：创建.tr文件（复合图像文件和框文件）

语法：

tesseract [langname].[fontname].[expN].[file-extension] [langname].[fontname].[expN] box.train

例如： tesseract own.arial.exp0.jpg own.arial.exp0 box.train

步骤3：从框文件中提取字符集（此命令的输出为unicharset文件）

语法：

unicharset_extractor [langname].[fontname].[expN].box

例如：

unicharset_extractor  own.arial.exp0.box

步骤4：根据我们的需求创建一个font_properties文件。

语法：

echo "[fontname] [italic (0 or 1)] [bold (0 or 1)] [monospace (0 or 1)] [serif (0 or 1)] [fraktur (0 or 1)]" > font_properties

例如：

echo "arial 0 0 1 0 0" > font_properties

第5步：训练数据。

语法：

mftraining -F font_properties -U unicharset -O [langname].unicharset [langname].[fontname].[expN].tr

例如：

mftraining -F font_properties -U unicharset -O own.unicharset own.arial.exp0.tr

第6步：

语法：

cntraining [langname].[fontname].[expN].tr

例如：

cntraining own.arial.exp0.tr

{*注意：在第5步和第6步之后，创建了四个文件。（shapetable，inttemp，pffmtable，normproto）}

步骤7：重命名四个文件（shapetable，inttemp，pffmtable，normproto）到（[langname] .shapetable，[langname] .inttemp，[langname] .pffmtable，[langname] .normproto）

语法：

rename filename1 filename2

例如：

    rename shapetable own.shapetable
    rename inttemp own.inttemp
    rename pffmtable own.pffmtable
    rename normproto own.normproto

步骤8：创建.traineddata文件

语法：

combine_tessdata [langname].

例如：

combine_tessdata own.

{*注意：我将仅使用一张图片exp0来创建经过训练的数据。如果您要训练一张以上的图片，则可以训练，即exp1，exp2..expn}

{参考： http://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/

如何为Tesseract 4.1.0创建Traineddata文件

1 个答案: