我们正在使用Tess4J / Tesseract在webapp上执行OCR。在Windows上一切正常,但是当部署在Linux机器(CentOS 6.8)上时,程序崩溃并自动终止Apache tomcat服务器。
我们同时读取多个文件(不同文件)。如果我们运行OCR,它会在通过致命错误后大约1分钟运行。你能建议如何解决吗?
Java Runtime Environment检测到致命错误:
SIGSEGV(0xb)at pc = 0x00007f7d5934ff90,pid = 17649, TID = 140176377489152
JRE版本:Java(TM)SE运行时环境(8.0_60-b27)(版本1.8.0_60-b27) Java VM:Java HotSpot(TM)64位服务器VM(25.60-b23混合模式linux-amd64压缩oops) 有问题的框架:
C [libtesseract.so.3.0.2+0x22cf90] tesseract::HistogramRect(unsigned char const*, int, int, int, int, int, int, int*)+0x70
无法编写核心转储。核心转储已被禁用。要启用核心转储,请在再次启动Java之前尝试ulimit -c unlimited
答案 0 :(得分:0)
我通过在将图像传递给tess4j之前将图像大小调整为固定大小(您可以在javacv中调整百分比)来修复它。
我调整大小方法的示例。
public static IplImage resize(IplImage img_source){
IplImage resized = IplImage.create(600, 480, img_source.depth(), img_source.nChannels());
cvResize(img_source,resized);
return resized;
}
然后我在下面做我的tesseract提取:
public static String extract(BufferedImage bi, Rectangle r) throws CvHandler, IOException, TesseractException{
ITesseract tess = new Tesseract();
String tessPath = getTess();
tess.setPageSegMode(1);
tess.setLanguage("eng");
tess.setDatapath(tessPath);
tess.setOcrEngineMode(TessOcrEngineMode.OEM_DEFAULT);
tess.setTessVariable("load_system_dawg", "false");
tess.setTessVariable("load_freq_dawg", "false");
tess.setTessVariable("tessedit_create_hocr", "0");
tess.setTessVariable("tessedit_char_whitelist","ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789");
String result = "";
if (!r.getBounds().isEmpty()){
try{
result = tess.doOCR(bi, r);
}catch(TesseractException e){
throw new CvHandler(e.getMessage());
}
}else result = tess.doOCR(bi);
return result;
}
将IplImage转换为BufferedImage Source的辅助方法:
public static BufferedImage convertIplToBuffered(IplImage img){
OpenCVFrameConverter.ToIplImage grabberConverter = new OpenCVFrameConverter.ToIplImage();
Java2DFrameConverter paintConverter = new Java2DFrameConverter();
Frame frame = grabberConverter.convert(img);
BufferedImage img_result = paintConverter.getBufferedImage(frame,1);
return img_result;
}