我正在使用tesseract 3.02并且想要忽略或用空格替换具有低置信度值的符号,例如10。这就像用符号值对符号进行阈值处理。
API仅提供具有RIL_SYMBOL,RIL_PARA,RIL_TEXTLINE等级别选项的ResultIterator。我想迭代每个符号 每一页。
我想要这样的东西,
tesseract::ResultIterator *it = tess.GetIterator();
do {
if (it->Empty(RIL_PARA))
continue;
char *para_text = new_iterator ->GetUTF8Text(RIL_PARA);
NOW ITERATE EVERY SYMBOL {
char* symbol = some_iterator->GetUTF8Text(RIL_SYMBOL);
if( confidence of symbol < 35 ){
text += " ";
delete []symbol;
}
else{
text += symbol;
delete []symbol;
}
}
} while (it->Next(RIL_PARA));
char* result = new char[text.length() + 1];
strcpy(result,text.c_str());
我想要这样的事情,这可能吗?