我一直在制作一个OCR程序,该程序接受带有文本的照片(在这种特定情况下,是驾驶执照)以及作为参数的名字和姓氏。
一旦软件读取了id照片,我就会在识别的文本中搜索名字和姓氏。不幸的是,由于图像质量可能非常低,有时候它的名称相当。
有没有办法可以在大海捞针中找到一个SIMILAR针?也就是说,查找与第一个/最后一个名字相似的任何事件?例如:
Needle: campbell
Haystack:
operaioxsllcence
gcltdriver
exries13NOV2020
carnpbeiljtttj
...
足够接近的字符串是“carnpbeil”。
这就是我现在使用的,它只在非常具体的情况下有用:
private bool SourceContains(string haystack, string needle)
{
bool ret = false;
if (haystack.Contains(needle) ||
haystack.Replace("l", "i").Contains(needle) ||
haystack.Replace("i", "l").Contains(needle) ||
haystack.Replace("0", "o").Contains(needle) ||
haystack.Replace("o", "0").Contains(needle) ||
haystack.Replace("j", "d").Contains(needle) ||
haystack.Replace("d", "j").Contains(needle) ||
haystack.Replace("i", "j").Contains(needle) ||
haystack.Replace("j", "i").Contains(needle) ||
haystack.Replace("e", "f").Contains(needle) ||
haystack.Replace("f", "e").Contains(needle) ||
haystack.Replace("r", "p").Contains(needle) ||
haystack.Replace("p", "r").Contains(needle) ||
haystack.Replace("s", "r").Contains(needle) ||
haystack.Replace("r", "s").Contains(needle) ||
haystack.Replace("r", "n").Contains(needle) ||
haystack.Replace("n", "r").Contains(needle) ||
haystack.Replace("k", "n").Contains(needle) ||
haystack.Replace("n", "k").Contains(needle) ||
haystack.Replace("h", "n").Contains(needle) ||
haystack.Replace("n", "h").Contains(needle) ||
haystack.Replace("k", "ll").Contains(needle) ||
haystack.Replace("ll", "k").Contains(needle) ||
haystack.Replace("ci", "d").Contains(needle) ||
haystack.Replace("d", "ci").Contains(needle) ||
haystack.Replace("cl", "d").Contains(needle) ||
haystack.Replace("d", "cl").Contains(needle) ||
haystack.Replace("m", "in").Contains(needle) ||
haystack.Replace("in", "m").Contains(needle) ||
haystack.Replace("rn", "m").Contains(needle) ||
haystack.Replace("m", "rn").Contains(needle)
)
{
ret = true;
}
return ret;
}
答案 0 :(得分:0)
haystack
中的每个单词都会计算levenshtein distance到needle
。距离最短的词最有可能是你的针。请查看this question的实现。