只有来自tesseract的数字 - 在VB上的OCR?

时间:2014-09-02 17:30:38

标签: vb.net ocr tesseract digits emgucv

我需要一个应用程序,在我的屏幕上观察数字,然后用它进行计算,所以经过几天研究最好和最简单的方法我发现这个视频 (https://www.youtube.com/watch?v=Kjdu8SjEtG0)导致我在Visual Basic 2010 Express上使用OCR和EMGU-Tesseract。我完全理解了视频,并在视频描述中对代码进行了自己的修改。

我导入了:

Imports Emgu.CV
Imports Emgu.Util
Imports Emgu.CV.OCR
Imports Emgu.CV.Structure

然后我根据原始代码制作:

Dim OCRz As Tesseract = New Tesseract("tessdata", "eng", Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY)
Dim picStc1 As Bitmap = New Bitmap(149, 28)
Dim gfxSTK1 As Graphics = Graphics.FromImage(picStc1)
Dim picNam1 As Bitmap = New Bitmap(149, 28)
Dim gfxNAM1 As Graphics = Graphics.FromImage(picNam1)


Private Sub Timer1_Tick(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Timer1.Tick

    gfxSTK1.CopyFromScreen(New Point(Me.Location.X + Stk1.Location.X + 5, Me.Location.Y + Stk1.Location.Y + 24), New Point(0, 0), picStc1.Size)
    Stk1.Image = picStc1

    gfxNAM1.CopyFromScreen(New Point(Me.Location.X + Nome1.Location.X + 5, Me.Location.Y + Nome1.Location.Y + 24), New Point(0, 0), picNam1.Size)
    Nome1.Image = picNam1

当我按下按钮时,我得到了这个:

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

    OCRz.Recognize(New Image(Of Bgr, Byte)(picStc1))
    BOXSTK1.Text = OCRz.GetText

    OCRz.Recognize(New Image(Of Bgr, Byte)(picNam1))
    BoxNAME1.Text = OCRz.GetText

我现在通过OCR引擎从PictureBoxes(picStc1)和(picNam1)读取文本,并在按下按钮后在RichTextBoxes(BoxSTK1)和(NAME1)上写入。

RichTextBox(BoxSTK1)上的数字带有逗号和其他符号,但我只想抓取数字。所以我发现了这个(https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits?)但是我无法在项目中实现它,对此有任何帮助吗?

(我正在使用Emgu 2.9.0.1922,不知道如何查看Tesseract的版本)

3 个答案:

答案 0 :(得分:0)

这个基于数字的"白名单"似乎是初始化对象时设置的内容。 Check out this question

所以你需要改变,

Dim OCRz As Tesseract = New Tesseract("tessdata", "eng", Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY)

对于这样的事情,

Dim OCRz As Tesseract = New Tesseract()
OCRz.SetVariable("tessedit_char_whitelist", "0123456789")
OCRz.init("tessdata", "eng", false)

答案 1 :(得分:0)

好的人,这个问题解决了!感谢Mr.Jimmy Smith! 现在我们不需要训练任何tesseract。 通过将OCR值转换为字符串!

首先使用以下方法定义白名单:

OCRz.SetVariable("tessedit_char_whitelist", ",$0123456789")

然后像这样转换字符串并打印出来:

RichTextBox1.Text = Convert.ToString(OCRz.GetText).Replace("$", "").Replace(",", "")

最后我们得到了这个:

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

    OCRz.SetVariable("tessedit_char_whitelist", ",$0123456789")


    OCRz.Init("tessdata", "eng", False)


    OCRz.Recognize(New Image(Of Bgr, Byte)(pic))
    RichTextBox1.Text = Convert.ToString(OCRz.GetText).Replace("$", "").Replace(",", "")

我会再次感谢吉米史密斯的快速答案,非常有用,请注意自己投票给这个人;)

答案 2 :(得分:0)

On fix and download

Dim OCRz As Tesseract = 
 New Tesseract("tessdata", "eng",Tesseract.OcrEngineMode.OEM_DEFAULT)
Dim pic As Bitmap = New Bitmap(270, 100)
Dim gfx As Graphics = Graphics.FromImage(pic)