Question

我对tesseract和openCV都很陌生。我正在构建一个简单的Linux应用程序来在纸上指定打印文本。使用tesseract我设法进行文本块识别，但是如果文本块中有整数，则该数字将被省略。例输入：＆＃34;您好，这是我2014年的3D视频，即1080p＆＃34; 输出：＆＃34;您好，这是我的3D视频，即1080p＆＃34;

以前有人遇到过这个问题吗？

openCV：2.4.9 正方体：V3.02 Leptonica：1.71 操作系统：Ubuntu 64bit 14.04 LTS

此致

Answer 1

我以前没见过这个，但我知道你可以用Page Seg模式改变这种行为。您使用的是自动页面分割吗？尝试其他一些设置，并按块，行甚至单词输入文本。您可以通过以下方式根据手册更改页面分段模式：

-psm N
Set Tesseract to only run a subset of layout analysis and assume a certain form of image. The options for N are:

0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR.
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.

tesseract在文本块中的整数识别

1 个答案: