Tesseract在训练时未能使用unicharambigs文件

时间:2016-10-03 11:49:25

标签: ocr tesseract

我在使用字体i创建的tesseract培训时遇到问题。 在生成一堆tesseract文件并将它们组合起来的整个过程之后,我的tesseract将所有“7”读作“?”。字体ha都是字符。

我创建了包含以下内容的unicharambigs文件:

v1
1   ?   1   7   1

它在unix fileformat中保存在Vi中,并在最后一行后包含新行char。它应该取代所有'?'为'7'。

组合给我结果:

 
    Combining tessdata files
    TessdataManager combined tesseract data files.
    Offset for type  0 (SmAftersale.config                ) is -1
    Offset for type  1 (SmAftersale.unicharset            ) is 140
    Offset for type  2 (SmAftersale.unicharambigs         ) is 3047
    Offset for type  3 (SmAftersale.inttemp               ) is 3061
    Offset for type  4 (SmAftersale.pffmtable             ) is 350802
    Offset for type  5 (SmAftersale.normproto             ) is 351219
    Offset for type  6 (SmAftersale.punc-dawg             ) is -1
    Offset for type  7 (SmAftersale.word-dawg             ) is -1
    Offset for type  8 (SmAftersale.number-dawg           ) is -1
    Offset for type  9 (SmAftersale.freq-dawg             ) is -1
    Offset for type 10 (SmAftersale.fixed-length-dawgs    ) is -1
    Offset for type 11 (SmAftersale.cube-unicharset       ) is -1
    Offset for type 12 (SmAftersale.cube-word-dawg        ) is -1
    Offset for type 13 (SmAftersale.shapetable            ) is 357761
    Offset for type 14 (SmAftersale.bigram-dawg           ) is -1
    Offset for type 15 (SmAftersale.unambig-dawg          ) is -1
    Offset for type 16 (SmAftersale.params-model          ) is -1
    Output SmAftersale.traineddata created successfully.

“SmAftersale.unicharambigs”文件的偏移量不是-1,所以我假设该文件已被读取。但是,在所有这些之后,tesseract仍然将所有'7'读作'?'。

我做错了什么或我错过了什么?

0 个答案:

没有答案