Question

我在我的Android应用程序中使用Tesseract。我定义了我的＆＃34;用户词＆＃34;文件和我添加了ocr的粗线来考虑用户词文件。

String language = "deu";
datapath = getFilesDir()+ "/tesseract/";
Tess = new TessBaseAPI();

checkFile(new File(datapath + "tessdata/"));
**Tess.setVariable("user_words_suffix","deu.user-words");**
Tess.init(datapath, language);

我没有定义用户模式文件，因为我的图像中没有任何特定模式。我只是复制tessdata文件夹中的due.user-words的UTF-8 txt文件。这对于ocr配置是否足够？或者我应该解压到due_traindata并将此文件添加到due_traindata然后打包吗？如果是，你可以给我一些暗示如何做到这一点。

Answer 1

您不需要在代码中指定语言前缀：

deu.user-words

确保文件的前缀与指定的语言代码匹配 - 即var searchresponse = [{ "items": [{ "employeeId": "ABC", "type": "D", "alive": "Yes" }, { "employeeId": "DEF", "type": "D", "alive": "Yes" }, { "employeeId": "NPK", "type": "D", "alive": "Yes" }, { "employeeId": "PKN", "type": "A", "alive": "Yes" }], "more": false }];。

https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc https://github.com/tesseract-ocr/tesseract/wiki/ControlParams

仅向Tesseract添加用户词

1 个答案: