使用Java Tesseract OCR

时间:2015-07-20 16:24:27

标签: java ocr tesseract

我尝试使用Java实现tesseract ocr。但我希望转换后的图像的输出存储在单独的文本文件中。 但我得到空test.txt档。

以下是代码:

    import java.io.FileOutputStream;
    import org.bytedeco.javacpp.*;
    import org.junit.Test;
    import static org.bytedeco.javacpp.lept.*;
    import static org.bytedeco.javacpp.tesseract.*;
    import static org.junit.Assert.assertTrue;
    import java.io.File;


    public class BasicTesseractExampleTest {
    @Test
    public void givenTessBaseApi_whenImageOcrd_thenTextDisplayed() throws Exception {
        BytePointer outText;

        TessBaseAPI api = new TessBaseAPI();
        // Initialize tesseract-ocr with English, without specifying tessdata path
        if (api.Init(".", "ENG") != 0) {
            System.err.println("Could not initialize tesseract.");
            System.exit(1);
        }

        // Open input image with leptonica library 
        PIX image = pixRead("IMG_0012 (1).jpg");
        api.SetImage(image);

        // Get OCR result
        outText = api.GetUTF8Text();
        String string = outText.getString();
        assertTrue(!string.isEmpty());
        System.out.println("OCR output:\n" + string);
        FileOutputStream file = new FileOutputStream("test.txt");
        TeePrintStream tee = new TeePrintStream(file, System.out);
        System.setOut(tee);

        // Destroy used object and release memory
        api.End();
        outText.deallocate();
        pixDestroy(image);

    }
}

1 个答案:

答案 0 :(得分:0)

添加返回类型字符串,并在main方法中写入System.out.println(tee);