Question

我使用R，版本3.3.2。我试图使用新的tesseract包解析一些文本。图像看起来像这样：

代码很简单：

library(tesseract)
engine <- tesseract(options = list(tessedit_char_whitelist = "0123456789abcdefghijklmnopqrstuvwxyz"))
text <- ocr("some_image_path.png", engine = engine)

结果是：

Too few characters. Skipping this page

为什么它不识别任何角色？

Answer 1

因为有Too few characters？

似乎有a limit

const int kMinCharactersToTry = 50;

经过测试，在失败时返回错误

// If there are too few characters, skip this page entirely.
  if (real_max < kMinCharactersToTry / 2) {
    tprintf("Too few characters. Skipping this page\n");
    return 0;
  }

再次尝试使用超过25个字符的样本？

R中的tesseract包不识别任何字符

1 个答案: