我正在使用tesseract ocr为我的最后一年项目培训一种新语言。
我从单词列表中创建了word-dawg。 但是,如果我包含word-dawg和wordlist或nor,combine_tessdata结果是相同的。所以我不确定我的word-dawg和wordlist是否包含在我的训练数据中。
输出如下: 类型0的偏移量为-1 类型1的偏移量为140 类型2的偏移是3726 类型3的偏移量为3904 类型4的偏移量是346848 类型5的偏移量是347329 类型6的偏移量为-1 类型7的偏移量为-1 类型8的偏移量为-1 类型9的偏移量为-1 类型10的偏移量为-1 类型11的偏移量为-1 类型12的偏移量为-1 13型偏移为354078 类型14的偏移量为-1 类型15的偏移量为-1 类型16的偏移量为-1
我相信偏移量2适用于unicharambigs。 知道哪个偏移是word-dawg? 剩下的偏差怎么样?
答案 0 :(得分:1)
可能是文件名问题。以下是我培训的结果。 “-1”表示文件不存在。
Combining tessdata files
Output vie.traineddata created sucessfully.
TessdataManager combined tesseract data files.
Offset for type 0 (vie.config ) is -1
Offset for type 1 (vie.unicharset ) is 140
Offset for type 2 (vie.unicharambigs ) is 15877
Offset for type 3 (vie.inttemp ) is 21397
Offset for type 4 (vie.pffmtable ) is 1466247
Offset for type 5 (vie.normproto ) is 1468147
Offset for type 6 (vie.punc-dawg ) is -1
Offset for type 7 (vie.word-dawg ) is 1513182
Offset for type 8 (vie.number-dawg ) is -1
Offset for type 9 (vie.freq-dawg ) is 1589568
Offset for type 10 (vie.fixed-length-dawgs ) is -1
Offset for type 11 (vie.cube-unicharset ) is -1
Offset for type 12 (vie.cube-word-dawg ) is -1
Offset for type 13 (vie.shapetable ) is 1594178
Offset for type 14 (vie.bigram-dawg ) is -1
Offset for type 15 (vie.unambig-dawg ) is -1
Offset for type 16 (vie.params-training-model ) is -1