android tessaract API识别非单词

时间:2015-12-07 20:50:26

标签: android ocr tesseract tess-two

我正在尝试使用tess-two API识别android中的随机字符。 我有一张带有字符串的打印纸页:“5XqaLB”
当我向相机的部分显示字符串以识别它时,我得到以下示例:

baseApi.setVariable("load_system_dawg", "0");
baseApi.setVariable("load_freq_dawg", "0");
baseApi.setVariable("load_punc_dawg", "0");
baseApi.setVariable("load_number_dawg", "0");
baseApi.setVariable("load_unambig_dawg", "0");
baseApi.setVariable("load_bigram_dawg", "0");
baseApi.setVariable("load_fixed_length_dawgs", "0");
baseApi.setVariable("segment_penalty_garbage", "0");
baseApi.setVariable("segment_penalty_dict_nonword", "0");
baseApi.setVariable("segment_penalty_dict_frequent_word", "0");
baseApi.setVariable("segment_penalty_dict_case_ok", "0");
baseApi.setVariable("segment_penalty_dict_case_bad", "0");

我认为这是因为tesseract试图用识别的字符猜测一个单词。我搜索了很多但找不到解决方案。 任何人都有想法避免这种替代品吗?

已经尝试过白名单,黑名单和confs:

build.gradle

任何人都可以猜测如何让tesseract只识别普通字符吗?

1 个答案:

答案 0 :(得分:-1)

我设法解决了我遇到的类似问题。在我的情况下,我正在识别板块字符。我没有在整个平板图像中使用tesseract,而是进行了分离字符的预处理,因此我可以分别对每个字符使用tesseract。我的配置varibles:

final TessBaseAPI baseApi = new TessBaseAPI();
    baseApi.init(TESSBASE_PATH, DEFAULT_DIC, TessBaseAPI.OEM_DEFAULT);
    baseApi.setDebug(true);
    baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "ABCDEFGHIJKLMNOPQRSTUVXWYZ1234567890");

    baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_CHAR);
    baseApi.setVariable("load_system_dawg", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("load_freq_dawg", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("load_punc_dawg", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("load_number_dawg", TessBaseAPI.VAR_TRUE);
    baseApi.setVariable("load_unambig_dawg", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("load_bigram_dawg", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("load_fixed_length_dawgs", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("segment_penalty_garbage", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("segment_penalty_dict_nonword", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("segment_penalty_dict_frequent_word", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("segment_penalty_dict_case_ok", TessBaseAPI.VAR_FALSE);
    baseApi.setVariable("segment_penalty_dict_case_bad", TessBaseAPI.VAR_FALSE);
    return baseApi;