Question

我正在用tesseract-ocr库编写Qt应用程序。当我测试tesseract时，我发现只有在用'eng'param初始化它时它才能识别文本。如果我指定'rus'param GetUTF8Text（）返回如下内容： Ð¢Ð<9d>Ð<86>Ð<85> Ð<86>Ð<85> Ð¼Ð°Ð¼Ð°

* .traedateddata文件位于/ usr / local / share / tessdata目录中。它还包含rus.traineddata文件。

有什么问题？

Answer 1

我找到了解决方案！它与GetUTF8Text函数返回的编码文本有关。

char* recognizedText = tessApi.GetUTF8Text(); // recognizes text with tesseract
QTextCodec* codec = QTextCodec::codecForName("UTF-8"); // creates UTF-8 codec
QString decodedText = codec->toUnicode(recognizedText); // Converts to UNICODE

有效！

Tesseract不承认俄语

1 个答案: