tesseract ANDROID中的黑名单和白名单

时间:2014-11-06 15:29:24

标签: android ocr tesseract whitelist blacklist

我正在开发一个Android应用程序,通过电话的相机或从画廊拍照卡来为信用卡充值。我使用tesseract库为此目的只使用黑名单和白名单的数字..它不按预期工作

我使用的图片仅包含这两行:

PIN码

41722757649786

开始充电活动之前的结果是:

718 200

41722757649786

我只想识别没有字母的数字而不使用cropper ..

  public void initTess(){   

    if (mBaseApi != null)
        mBaseApi.end();     

    mBaseApi = new TessBaseAPI();
    mBaseApi.setDebug(false);

    mBaseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_OSD_ONLY);
    mBaseApi.init(mDataDir + File.separator,"eng");
    mBaseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST,"0123456789");
    mBaseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST,"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmopqrstuvwxyz");


}

1 个答案:

答案 0 :(得分:3)

设置"tessedit_char_whitelist"变量必须在初始化之前完成,如FAQ:https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_recognize_only_digits中所述? 这很可能也适用于黑名单。

因此,更改您的代码:

mBaseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_OSD_ONLY);
mBaseApi.init(mDataDir + File.separator,"eng");
mBaseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST,"0123456789");
mBaseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST,"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmopqrstuvwxyz");

到此:

mBaseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_OSD_ONLY);
mBaseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST,"0123456789");
mBaseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST,"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmopqrstuvwxyz");
mBaseApi.init(mDataDir + File.separator,"eng");

应该这样做。