根据我的经验,OCR库往往只输出在图像中找到的文本,而不是 找到文本。是否有一个OCR库可以输出图像中找到的单词以及找到这些单词的坐标(x, y, width, height
)?
答案 0 :(得分:22)
大多数商业OCR引擎都会返回单词和字符坐标位置,但您必须使用其SDK来提取信息。即使是Tesseract OCR也会返回位置信息,但这并不容易。版本3.01将变得更容易,但仍在处理DLL接口。
不幸的是,大多数免费的OCR程序以其基本形式使用Tesseract OCR,它们只报告原始ASCII结果。
www.transym.com - Transym OCR - 输出坐标。 www.rerecognition.com - Kasmos引擎返回坐标。
Caere Omnipage,Mitek,Abbyy,Charactell也回归角色。
答案 1 :(得分:14)
我正在使用TessNet(Tesseract C#包装器),我正在使用以下代码获取单词坐标:
TextWriter tw = new StreamWriter(@"U:\user files\bwalker\ocrTesting.txt");
Bitmap image = new Bitmap(@"u:\user files\bwalker\2849257.tif");
tessnet2.Tesseract ocr = new tessnet2.Tesseract();
// If digit only
ocr.SetVariable("tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.,$-/#&=()\"':?");
// To use correct tessdata
ocr.Init(@"C:\Users\bwalker\Documents\Visual Studio 2010\Projects\tessnetWinForms\tessnetWinForms\bin\Release\", "eng", false);
List<tessnet2.Word> result = ocr.DoOCR(image, System.Drawing.Rectangle.Empty);
string Results = "";
foreach (tessnet2.Word word in result)
{
Results += word.Confidence + ", " + word.Text + ", " +word.Top+", "+word.Bottom+", "+word.Left+", "+word.Right+"\n";
}
using (StreamWriter writer = new StreamWriter(@"U:\user files\bwalker\ocrTesting2.txt", true))
{
writer.WriteLine(Results);//+", "+word.Top+", "+word.Bottom+", "+word.Left+", "+word.Right);
writer.Close();
}
MessageBox.Show("Completed");
答案 2 :(得分:2)
您还可以查看Gamera框架(http://gamera.informatik.hsnr.de/)它是一组工具,可以让您构建自己的OCR引擎。然而,最快的方法是使用Tesseract或OCRopus hOCR(http://en.wikipedia.org/wiki/HOCR)输出。
答案 3 :(得分:2)
您可以像tesseract那样使用hocr
“configfile”:
tesseract syllabus-page1.jpg syllabus-page1 hocr
这将输出一个主要是HTML5的文档,其中包含以下元素:
<div class='ocr_page' id='page_1' title='image "syllabus-page1.jpg"; bbox 0 0 2531 3272; ppageno 0'>
<div class="ocr_carea" id="block_1_4" title="bbox 265 1183 2147 1778">
<p class="ocr_par" dir="ltr" id="par_1_8" title="bbox 274 1305 655 1342">
<span class="ocr_line" id="line_1_14" title="bbox 274 1305 655 1342; baseline -0.005 0; x_size 46.378059; x_descenders 10.378059; x_ascenders 12">
<span class="ocrx_word" id="word_1_78" title="bbox 274 1307 386 1342; x_wconf 90" lang="eng" dir="ltr">needs</span>
<span class="ocrx_word" id="word_1_79" title="bbox 402 1318 459 1342; x_wconf 90" lang="eng" dir="ltr">are</span>
<span class="ocrx_word" id="word_1_80" title="bbox 474 1305 655 1341; x_wconf 86" lang="eng" dir="ltr">different:</span>
</span>
</p>
...
</div>
...
</div>
虽然我很确定你不应该如何使用XML,但我发现它比挖掘tesseract API更容易。
P.S。我意识到有几条评论和答案提到了这个解决方案,但它们都没有实际展示如何使用hocr
选项或描述你从中获得的输出。
答案 4 :(得分:1)
对于Java开发人员:
我建议您使用Tesseract和Tess4j。
你可以在Tess4j的一个测试中找到一个如何在Image上查找单词的例子。
public void testResultIterator() throws Exception {
logger.info("TessBaseAPIGetIterator");
File tiff = new File(this.testResourcesDataPath, "eurotext.tif");
BufferedImage image = ImageIO.read(new FileInputStream(tiff)); // require jai-imageio lib to read TIFF
ByteBuffer buf = ImageIOHelper.convertImageData(image);
int bpp = image.getColorModel().getPixelSize();
int bytespp = bpp / 8;
int bytespl = (int) Math.ceil(image.getWidth() * bpp / 8.0);
api.TessBaseAPIInit3(handle, datapath, language);
api.TessBaseAPISetPageSegMode(handle, TessPageSegMode.PSM_AUTO);
api.TessBaseAPISetImage(handle, buf, image.getWidth(), image.getHeight(), bytespp, bytespl);
ETEXT_DESC monitor = new ETEXT_DESC();
TimeVal timeout = new TimeVal();
timeout.tv_sec = new NativeLong(0L); // time > 0 causes blank ouput
monitor.end_time = timeout;
ProgressMonitor pmo = new ProgressMonitor(monitor);
pmo.start();
api.TessBaseAPIRecognize(handle, monitor);
logger.info("Message: " + pmo.getMessage());
TessResultIterator ri = api.TessBaseAPIGetIterator(handle);
TessPageIterator pi = api.TessResultIteratorGetPageIterator(ri);
api.TessPageIteratorBegin(pi);
logger.info("Bounding boxes:\nchar(s) left top right bottom confidence font-attributes");
int level = TessPageIteratorLevel.RIL_WORD;
// int height = image.getHeight();
do {
Pointer ptr = api.TessResultIteratorGetUTF8Text(ri, level);
String word = ptr.getString(0);
api.TessDeleteText(ptr);
float confidence = api.TessResultIteratorConfidence(ri, level);
IntBuffer leftB = IntBuffer.allocate(1);
IntBuffer topB = IntBuffer.allocate(1);
IntBuffer rightB = IntBuffer.allocate(1);
IntBuffer bottomB = IntBuffer.allocate(1);
api.TessPageIteratorBoundingBox(pi, level, leftB, topB, rightB, bottomB);
int left = leftB.get();
int top = topB.get();
int right = rightB.get();
int bottom = bottomB.get();
/******************************************/
/* COORDINATES AND WORDS ARE PRINTED HERE */
/******************************************/
System.out.print(String.format("%s %d %d %d %d %f", word, left, top, right, bottom, confidence));
// logger.info(String.format("%s %d %d %d %d", str, left, height - bottom, right, height - top)); //
// training box coordinates
IntBuffer boldB = IntBuffer.allocate(1);
IntBuffer italicB = IntBuffer.allocate(1);
IntBuffer underlinedB = IntBuffer.allocate(1);
IntBuffer monospaceB = IntBuffer.allocate(1);
IntBuffer serifB = IntBuffer.allocate(1);
IntBuffer smallcapsB = IntBuffer.allocate(1);
IntBuffer pointSizeB = IntBuffer.allocate(1);
IntBuffer fontIdB = IntBuffer.allocate(1);
String fontName = api.TessResultIteratorWordFontAttributes(ri, boldB, italicB, underlinedB, monospaceB,
serifB, smallcapsB, pointSizeB, fontIdB);
boolean bold = boldB.get() == TRUE;
boolean italic = italicB.get() == TRUE;
boolean underlined = underlinedB.get() == TRUE;
boolean monospace = monospaceB.get() == TRUE;
boolean serif = serifB.get() == TRUE;
boolean smallcaps = smallcapsB.get() == TRUE;
int pointSize = pointSizeB.get();
int fontId = fontIdB.get();
logger.info(String.format(" font: %s, size: %d, font id: %d, bold: %b,"
+ " italic: %b, underlined: %b, monospace: %b, serif: %b, smallcap: %b", fontName, pointSize,
fontId, bold, italic, underlined, monospace, serif, smallcaps));
} while (api.TessPageIteratorNext(pi, level) == TRUE);
assertTrue(true);
}
答案 5 :(得分:1)
Google Vision API会这样做。 https://cloud.google.com/vision/docs/detecting-text
"description": "Wake up human!\n",
"boundingPoly": {
"vertices": [
{
"x": 29,
"y": 394
},
{
"x": 570,
"y": 394
},
{
"x": 570,
"y": 466
},
{
"x": 29,
"y": 466
}
]
}
答案 6 :(得分:0)
ABCocr.NET(我们的组件)将允许您获取找到的每个单词的坐标。可以通过Word.Bounds属性访问这些值,该属性只返回System.Drawing.Rectangle。
以下示例显示了如何使用ABCocr.NET对图像进行OCR并输出所需信息:
using System;
using System.Drawing;
using WebSupergoo.ABCocr3;
namespace abcocr {
class Program {
static void Main(string[] args) {
Bitmap bitmap = (Bitmap)Bitmap.FromFile("example.png");
Ocr ocr = new Ocr();
ocr.SetBitmap(bitmap);
foreach (Word word in ocr.Page.Words) {
Console.WriteLine("{0}, X: {1}, Y: {2}, Width: {3}, Height: {4}",
word.Text,
word.Bounds.X,
word.Bounds.Y,
word.Bounds.Width,
word.Bounds.Height);
}
}
}
}
披露:由WebSupergoo团队成员发布。
答案 7 :(得分:0)
hocr是tesseract OCR引擎的输出格式之一,它具有单词和它的坐标,还有一些额外的信息,如自信的单词识别级别。
答案 8 :(得分:0)
设置参数isOverlayRequired = true
后,免费的OCR.space OCR API会返回单词坐标:
"ParsedResults" : [
{
"TextOverlay" : {
"Lines" : [
{
"Words": [
{
"WordText": "Word 1",
"Left": 106,
"Top": 91,
"Height": 9,
"Width": 11
},
{
"WordText": "Word 2",
"Left": 121,
"Top": 90,
"Height": 13,
"Width": 51
}
.