如何限制tess-two (Tesseract和Leptonica图书馆)的结果,
我希望Tesseract限制结果:
例如:
识别结果为“asn *& bhDK 1234 UDaks&%^ jdg”,然后只需“DK1234UD”。
所以,不要采取LowerChase,Enter,Space。只取UperChase和数字。
我使用Java源代码
这是识别码:
TessBaseAPI baseApi = new TessBaseAPI();
baseApi.setPageSegMode(TessBaseAPI.OEM_TESSERACT_CUBE_COMBINED);
baseApi.setPageSegMode(PageSegMode.PSM_AUTO_OSD);
baseApi.setPageSegMode(PageSegMode.PSM_SINGLE_LINE);
baseApi.setDebug(true);
baseApi.init(DATA_PATH, lang);
//setImage
baseApi.setImage(bmpOtsu);
//set whitelist
String whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, whitelist);
//variable for recognizing
String recognizedText = baseApi.getUTF8Text();
String resultTxt = recognizedText;
baseApi.end();
if ( lang.equalsIgnoreCase("eng") ) {
recognizedText = recognizedText.replaceAll("[^A-Z0-9]", " ");
}
有人可以告诉我该怎么做?应该在这里添加什么?
答案 0 :(得分:2)
如果您使用TessBaseAPI
的实例,则可以使用常量setVariable()
VAR_CHAR_WHITELIST
String whiteList = "ABCD...XYZ1234567890";
tessBaseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST,whiteList);
您可以根据需要调整白名单 所以如果你想忽略除D和K之外的所有其他字母,请设置它:
String whiteList = "DK1234567890";
如果需要,您可能仍需要对结果执行更多字符串操作, 比如根据你的例子从结果的末尾删除字母 你可以得到这个结果(使用第二个whilteList)
DK1234UD
修改强>
要获得结果:DK123455UD 你可以使用substring()
String result = "DK123455UD";
int pos = result.indexOf("DK");
String finalResult = result.substring(pos,pos+8);
修改强>
喜欢这个?
String whitelist = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, whitelist);
//setImage
baseApi.setImage(bmpOtsu);
//variable for recognizing
String recognizedText = baseApi.getUTF8Text();
//
int get8digits = recognizedText.indexOf("D");
String resultTxt = recognizedText.substring(get8digits, get8digits+8);
答案 1 :(得分:1)
请向@Yazan索取答案及其工作 并且我提高了答案 这是我的代码:
TessBaseAPI baseApi = new TessBaseAPI();
baseApi.setPageSegMode(TessBaseAPI.OEM_TESSERACT_CUBE_COMBINED);
baseApi.setPageSegMode(PageSegMode.PSM_AUTO_OSD);
baseApi.setPageSegMode(PageSegMode.PSM_SINGLE_LINE);
baseApi.setDebug(true);
baseApi.init(DATA_PATH, lang);
//set variable
String whiteList = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
String blackList = "\\s";
baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, whiteList);
baseApi.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST, blackList);
//setImage
//baseApi.setImage(bmpOtsu, w, h, 8, (Integer) null);
baseApi.setImage(bmpOtsu);
//variable for recognizing
String recognizedText = baseApi.getUTF8Text();
recognizedText = recognizedText.replaceAll(blackList, "");//remove space
String resultTxt = recognizedText;
//
baseApi.end();
Log.v(TAG, "OCRED TEXT: " + recognizedText);
if ( lang.equalsIgnoreCase("eng") ) {
int get8digits = recognizedText.indexOf("D");
String loop = recognizedText.substring(get8digits, recognizedText.length());
if(recognizedText.contains("D") && loop.length() >= 8){
Log.w(TAG, "OPSI 1"+"\n"+"Length: "+loop.length()+"\n"+"Values: "+loop);
recognizedText = recognizedText.substring(get8digits, get8digits+8);
}else if(recognizedText.contains("D") && loop.length() < 8){
Log.w(TAG, "OPSI 2"+"\n"+"Length: "+loop.length()+"\n"+"Values: "+loop);
recognizedText = loop;
}else{
Log.w(TAG, "OPSI 3"+"\n"+"Length: "+loop.length()+"\n"+"Values: "+loop);
recognizedText = recognizedText.replaceAll("[A-Z0-9]"," ");
}
我希望这对任何人都有帮助。