Question

我正在使用tess4j api来读取数字图像。

代码如下：

public static void main(String[] args) {
    // TODO Auto-generated method stub

       final File imageFile = new File("C:\\Users\\goku\\Desktop\\myimage.png");
        System.out.println("Image found");
       final ITesseract instance = new Tesseract();
        instance.setTessVariable("tessedit_char_whitelist", "0123456789");
        instance.setDatapath("C:\\Users\\goku\\Downloads\\Tess4J"); 
        instance.setLanguage("eng");
        String result;
        try {
            result = instance.doOCR(imageFile);
            System.out.println(result);
        } catch (TesseractException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }


}

附图片。 myimage.png

该程序正在读数字错误。无法找到问题。

输出：

1 1 3 251

的问候，瓦苏

Answer 1

将图像重新缩放为300 DPI会得到正确的结果。

Answer 2

这是使用im4java（imagemagick）正确编辑图像，以便可以使用tess4j（tesseract）读取图像的方法：

$('#btnSave').on('click', function () {
    ..................................
     ................................
    $.ajax({
        type: "GET",
        //url: '@Url.Action("DownloadBOLPdf", "Shipment")',
        url: '/Shipment/DownloadBOLPdf',
        //data: '{ "shipmentkey":' + JSON.stringify(shipmentkeys) + '}',
        //data: JSON.stringify({ shipmentkey: arr, BOLPdfInputs: obj }),
        data: { shipmentkey: JSON.stringify(shipmentkeys), BOLPdfInputs: JSON.stringify(BOLPdfInputs) },
        success: function (data) {
            alert('Hello');
        },
        dataType: "json",
    });
});

Answer 3

可能是经过训练的数据。我使用了来自 https://digi.bib.uni-mannheim.de/tesseract/ 的 tesseract-ocr-w64-setup-v4.1.0.20190314.exe Windows 二进制文件的训练数据，数据路径设置如下

instance.setDatapath("C:\\Program Files\\Tesseract-OCR\\tessdata");

我确实收到了有关分辨率的警告，但结果是正确的： 471871882819

Tess4j图像阅读

3 个答案: