如何使用OpenCVdotnet预处理图像以获得更好的文本识别? 我尝试了tesseract包装和Puma.NET,但我的结果更糟......我怎样才能改善结果?
#region Tesseract
Bitmap pictureInfoArea = src.ToBitmap();
TesseractEngine engine = new TesseractEngine("tessdata/", "rus", EngineMode.Default);
//engine.SetVariable("tessedit_char_whitelist", "0123456789");
var page = engine.Process(pictureInfoArea, PageSegMode.Auto);
string sTesseract = page.GetText();
#endregion
#region Puma.NET
PumaPage pumaInfoArea = new PumaPage(pictureInfoArea);
using (pumaInfoArea)
{
// Changing default settings
pumaInfoArea.FileFormat = PumaFileFormat.TxtAnsi;
pumaInfoArea.EnableSpeller = true;
pumaInfoArea.Language = PumaLanguage.Russian;
// Recognizing and saving results to a file
string sPuma = pumaInfoArea.RecognizeToString();
//MessageBox.Show(s);
}
#endregion
答案 0 :(得分:0)
Here是一个教程,解释如何训练自己的语言。我建议您在应用字母分隔算法后安装jTessBoxeditor,以帮助您更好地训练模式。 jTessBoxeditor有一个GUI界面,可让您训练自己的数据集
或
Here您还有另一个培训Tesseract3用于新语言的教程。
看看这个(我没有测试过)sunnypage.ge/en http://lib.psnc.pl/Content/358/PSNC_Tesseract-FineReader-report.pdf