我使用R,版本3.3.2。我试图使用新的tesseract包解析一些文本。图像看起来像这样:
代码很简单:
library(tesseract)
engine <- tesseract(options = list(tessedit_char_whitelist = "0123456789abcdefghijklmnopqrstuvwxyz"))
text <- ocr("some_image_path.png", engine = engine)
结果是:
Too few characters. Skipping this page
为什么它不识别任何角色?
答案 0 :(得分:1)
因为有Too few characters
?
const int kMinCharactersToTry = 50;
经过测试,在失败时返回错误
// If there are too few characters, skip this page entirely.
if (real_max < kMinCharactersToTry / 2) {
tprintf("Too few characters. Skipping this page\n");
return 0;
}
再次尝试使用超过25
个字符的样本?