Tesseract无法理解清除图像后无法识别图像

时间:2016-02-22 14:23:26

标签: python image-processing tesseract

我想在删除正确的数字后获得两位数的图像,以获得良好的准确性。 示例)OriginalModified

图像为PNG文件(52 * 26px),背景颜色为(192,192,192,255),每个数字的颜色不同。

但令人惊讶的是,在删除正确的号码之后,tesseract无法识别这个号码。

结果:

> head(df.j)
   Loan Identifier             variable                        df.new_value                        df.old_value
1:       960974101 Employment Type – B1 Employed or full loan is guaranteed                       Self-employed
2:       960959708 Employment Type – B1 Employed or full loan is guaranteed                       Self-employed
3:       960959806 Employment Type – B1                       Self-employed Employed or full loan is guaranteed
4:       960973707    Property Postcode                             LE4 8EE                                 TA1
5:       960974101    Property Postcode                             FY7 8HN                                 BB2
6:       960959610    Property Postcode                            RG18 4QS                                 BH9
original:60
left:

1 个答案:

答案 0 :(得分:0)

Tesseract在内部执行连接组件分析。它确实尝试将文本块组合在一起,并且可能由于页面中缺少太多字符而导致问题。有页面分割模式,您可以要求tesseract将图像视为单个字符。尝试这种方法它可能会给你所需的结果。