Question

有没有办法告诉Tess4J只有OCR一定数量的页面/字符？

我可能会使用 200+页PDF ，但我真的只想OCR 第一页，如果那样的话！

据我了解，常见样本

package net.sourceforge.tess4j.example;

import java.io.File;
import net.sourceforge.tess4j.*;

    public class TesseractExample {

        public static void main(String[] args) {
            File imageFile = new File("eurotext.tif");
            Tesseract instance = Tesseract.getInstance();  // JNA Interface Mapping
            // Tesseract1 instance = new Tesseract1(); // JNA Direct Mapping

            try {
                String result = instance.doOCR(imageFile);
                System.out.println(result);
            } catch (TesseractException e) {
                System.err.println(e.getMessage());
            }
        }
    }

尝试将整个，200多页 OCR压缩为单个字符串。

对于我的特殊情况，这比我需要它做的更多，而且我担心如果我让它完成所有200多页然后只是非常很长时间substring前500个左右。

Answer 1

该库有一个PdfUtilities类，可以提取PDF的某些页面。

Tess4J doOCR（）for * First Page * of pdf / tif

1 个答案: