我正在研究Tesseract,我已经有了OCR功能。我想优化图像,以便OCR结果更好。目前我只是使图像单色并将其缩放到其尺寸的两倍。即使在那之后,我遇到了较小字体的问题。
我试着抬头,here是我能找到的最佳答案之一。不幸的是,它适用于Bitmap,我找不到Java中适用于Bitmap的任何本机类。还有一个Java代码的答案,但它再次使用Bitmap,并没有指定他们从哪个包获得它。
BitmapImageUtil.convertToGrayscale()
来自哪里?
代码:
private String testOcr(String fileLocation, int attachId) {
try {
File imageFile = new File(fileLocation);
BufferedImage img = ImageIO.read(imageFile);
String identifier = String.valueOf(new BigInteger(130, random).toString(32));
String blackAndWhiteImage = previewPath + identifier + ".png";
File outputfile = new File(blackAndWhiteImage);
BufferedImage bufferedImage = BitmapImageUtil.convertToGrayscale(img,new Dimension(img.getWidth(),img.getHeight()));
bufferedImage = Scalr.resize(bufferedImage,img.getWidth()*2,img.getHeight()*2);
ImageIO.write(bufferedImage,"png",outputfile);
ITesseract instance = Tesseract.getInstance();
// Point to one folder above tessdata directory, must contain training data
instance.setDatapath("/usr/share/tesseract-ocr/");
// ISO 693-3 standard
instance.setLanguage("deu");
String result = instance.doOCR(outputfile);
// result processing with regex.
}
答案 0 :(得分:0)
BitmapImageUtil
来自Apache FOP project。 (“FOP”=“格式化对象处理器”)
套餐为org.apache.fop.util.bitmap
。