Question

我正在为Android构建一个OCR应用程序，我使用tesseract ocr引擎。不知何故，每当我在照片上使用引擎时，它都会返回一个空文本。这是我的代码：

public String detectText(Bitmap bitmap) {
    TessBaseAPI tessBaseAPI = new TessBaseAPI();
    String mDataDir = setTessData();
    tessBaseAPI.setDebug(true);
    tessBaseAPI.init(mDataDir + File.separator, "eng");
    tessBaseAPI.setImage(bitmap);
    tessBaseAPI.setPageSegMode(TessBaseAPI.OEM_TESSERACT_ONLY);
    String text = tessBaseAPI.getUTF8Text();

    tessBaseAPI.end();

    return text;
}

private String setTessData(){
    String mDataDir = this.getExternalFilesDir("data").getAbsolutePath();
    String mTrainedDataPath = mDataDir + File.separator + "tessdata";
    String mLang = "eng";
    // Checking if language file already exist inside data folder
    File dir = new File(mTrainedDataPath);
    if (!dir.exists()) {
        if (!dir.mkdirs()) {
            //showDialogFragment(SD_ERR_DIALOG, "sd_err_dialog");
        } else {
        }
    }

    if (!(new File(mTrainedDataPath + File.separator + mLang + ".traineddata")).exists()) {

        // If English or Hebrew, we just copy the file from assets
        if (mLang.equals("eng") || mLang.equals("heb")){
            try {
                AssetManager assetManager = context.getAssets();
                InputStream in = assetManager.open(mLang + ".traineddata");
                OutputStream out = new FileOutputStream(mTrainedDataPath + File.separator + mLang + ".traineddata");
                copyFile(in, out);
                //Toast.makeText(context, getString(R.string.selected_language) + " " + mLangArray[mLangID], Toast.LENGTH_SHORT).show();
                //Log.v(TAG, "Copied " + mLang + " traineddata");
            } catch (IOException e) {
                //showDialogFragment(SD_ERR_DIALOG, "sd_err_dialog");
            }
        }

        else{

            // Checking if Network is available
            if (!isNetworkAvailable(this)){
                //showDialogFragment(NETWORK_ERR_DIALOG, "network_err_dialog");
            }
            else {
                // Shows a dialog with File dimension. When user click on OK download starts. If he press Cancel revert to english language (like NETWORK ERROR)
                //showDialogFragment(CONTINUE_DIALOG, "continue_dialog");
            }
        }
    }
    else {
        //Toast.makeText(mThis, getString(R.string.selected_language) + " " + mLangArray[mLangID], Toast.LENGTH_SHORT).show();
    }
    return mDataDir;
}

我已多次调试它，并且位图正确地传输到detectText方法。语言数据文件（tessdata）存在于手机上，并且它们的路径也是正确的。

有人知道这里有什么问题吗？

Answer 1

您正在使用OCR引擎模式枚举值在setTessData（）方法中设置页面分段。

setTessData() {
    ...
    tessBaseAPI.setPageSegMode(TessBaseAPI.OEM_TESSERACT_ONLY);
}

根据您尝试检测字符的图像类型，设置适当的页面分割模式有助于检测字符。

例如：

tessBaseAPI.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO);

TessBaseApi.java中存在各种其他页面分段值：

/** Page segmentation mode. */
public static final class PageSegMode {
    /** Orientation and script detection only. */
    public static final int PSM_OSD_ONLY = 0;

    /** Automatic page segmentation with orientation and script detection. (OSD) */
    public static final int PSM_AUTO_OSD = 1;

    /** Fully automatic page segmentation, but no OSD, or OCR. */
    public static final int PSM_AUTO_ONLY = 2;

    /** Fully automatic page segmentation, but no OSD. */
    public static final int PSM_AUTO = 3;

    /** Assume a single column of text of variable sizes. */
    public static final int PSM_SINGLE_COLUMN = 4;

    /** Assume a single uniform block of vertically aligned text. */
    public static final int PSM_SINGLE_BLOCK_VERT_TEXT = 5;

    /** Assume a single uniform block of text. (Default.) */
    public static final int PSM_SINGLE_BLOCK = 6;

    /** Treat the image as a single text line. */
    public static final int PSM_SINGLE_LINE = 7;

    /** Treat the image as a single word. */
    public static final int PSM_SINGLE_WORD = 8;

    /** Treat the image as a single word in a circle. */
    public static final int PSM_CIRCLE_WORD = 9;

    /** Treat the image as a single character. */
    public static final int PSM_SINGLE_CHAR = 10;

    /** Find as much text as possible in no particular order. */
    public static final int PSM_SPARSE_TEXT = 11;

    /** Sparse text with orientation and script detection. */
    public static final int PSM_SPARSE_TEXT_OSD = 12;

    /** Number of enum entries. */
    public static final int PSM_COUNT = 13;
}

您可以尝试使用不同的页面分段枚举值，并查看哪种值可以获得最佳结果。

Tesseract ocr返回空字符串

1 个答案: