Tesseract OCR word-dawg不包括在combine_tessdata中

时间:2016-02-25 19:16:37

标签: ocr tesseract

我正在使用tesseract ocr为我的最后一年项目培训一种新语言。

我从单词列表中创建了word-dawg。 但是,如果我包含word-dawg和wordlist或nor,combine_tessdata结果是相同的。所以我不确定我的word-dawg和wordlist是否包含在我的训练数据中。

输出如下: 类型0的偏移量为-1 类型1的偏移量为140 类型2的偏移是3726 类型3的偏移量为3904 类型4的偏移量是346848 类型5的偏移量是347329 类型6的偏移量为-1 类型7的偏移量为-1 类型8的偏移量为-1 类型9的偏移量为-1 类型10的偏移量为-1 类型11的偏移量为-1 类型12的偏移量为-1 13型偏移为354078 类型14的偏移量为-1 类型15的偏移量为-1 类型16的偏移量为-1

我相信偏移量2适用于unicharambigs。 知道哪个偏移是word-dawg? 剩下的偏差怎么样?

1 个答案:

答案 0 :(得分:1)

可能是文件名问题。以下是我培训的结果。 “-1”表示文件不存在。

Combining tessdata files
Output vie.traineddata created sucessfully.
TessdataManager combined tesseract data files.
Offset for type  0 (vie.config                ) is -1
Offset for type  1 (vie.unicharset            ) is 140
Offset for type  2 (vie.unicharambigs         ) is 15877
Offset for type  3 (vie.inttemp               ) is 21397
Offset for type  4 (vie.pffmtable             ) is 1466247
Offset for type  5 (vie.normproto             ) is 1468147
Offset for type  6 (vie.punc-dawg             ) is -1
Offset for type  7 (vie.word-dawg             ) is 1513182
Offset for type  8 (vie.number-dawg           ) is -1
Offset for type  9 (vie.freq-dawg             ) is 1589568
Offset for type 10 (vie.fixed-length-dawgs    ) is -1
Offset for type 11 (vie.cube-unicharset       ) is -1
Offset for type 12 (vie.cube-word-dawg        ) is -1
Offset for type 13 (vie.shapetable            ) is 1594178
Offset for type 14 (vie.bigram-dawg           ) is -1
Offset for type 15 (vie.unambig-dawg          ) is -1
Offset for type 16 (vie.params-training-model ) is -1