我在使用字体i创建的tesseract培训时遇到问题。 在生成一堆tesseract文件并将它们组合起来的整个过程之后,我的tesseract将所有“7”读作“?”。字体ha都是字符。
我创建了包含以下内容的unicharambigs文件:
v1 1 ? 1 7 1
它在unix fileformat中保存在Vi中,并在最后一行后包含新行char。它应该取代所有'?'为'7'。
组合给我结果:
Combining tessdata files TessdataManager combined tesseract data files. Offset for type 0 (SmAftersale.config ) is -1 Offset for type 1 (SmAftersale.unicharset ) is 140 Offset for type 2 (SmAftersale.unicharambigs ) is 3047 Offset for type 3 (SmAftersale.inttemp ) is 3061 Offset for type 4 (SmAftersale.pffmtable ) is 350802 Offset for type 5 (SmAftersale.normproto ) is 351219 Offset for type 6 (SmAftersale.punc-dawg ) is -1 Offset for type 7 (SmAftersale.word-dawg ) is -1 Offset for type 8 (SmAftersale.number-dawg ) is -1 Offset for type 9 (SmAftersale.freq-dawg ) is -1 Offset for type 10 (SmAftersale.fixed-length-dawgs ) is -1 Offset for type 11 (SmAftersale.cube-unicharset ) is -1 Offset for type 12 (SmAftersale.cube-word-dawg ) is -1 Offset for type 13 (SmAftersale.shapetable ) is 357761 Offset for type 14 (SmAftersale.bigram-dawg ) is -1 Offset for type 15 (SmAftersale.unambig-dawg ) is -1 Offset for type 16 (SmAftersale.params-model ) is -1 Output SmAftersale.traineddata created successfully.
“SmAftersale.unicharambigs”文件的偏移量不是-1,所以我假设该文件已被读取。但是,在所有这些之后,tesseract仍然将所有'7'读作'?'。
我做错了什么或我错过了什么?