C#在大海捞针中寻找类似的针(用于OCR)

时间:2017-09-09 19:31:45

标签: c# string ocr

我一直在制作一个OCR程序,该程序接受带有文本的照片(在这种特定情况下,是驾驶执照)以及作为参数的名字和姓氏。

一旦软件读取了id照片,我就会在识别的文本中搜索名字和姓氏。不幸的是,由于图像质量可能非常低,有时候它的名称​​相当

有没有办法可以在大海捞针中找到一个SIMILAR针?也就是说,查找与第一个/最后一个名字相似的任何事件?例如:

Needle: campbell

Haystack: 
operaioxsllcence 
gcltdriver 
exries13NOV2020
carnpbeiljtttj
...

足够接近的字符串是“carnpbeil”。

这就是我现在使用的,它只在非常具体的情况下有用:

private bool SourceContains(string haystack, string needle)
    {
        bool ret = false;
        if (haystack.Contains(needle) ||
                haystack.Replace("l", "i").Contains(needle) ||
                haystack.Replace("i", "l").Contains(needle) ||
                haystack.Replace("0", "o").Contains(needle) ||
                haystack.Replace("o", "0").Contains(needle) ||
                haystack.Replace("j", "d").Contains(needle) ||
                haystack.Replace("d", "j").Contains(needle) ||
                haystack.Replace("i", "j").Contains(needle) ||
                haystack.Replace("j", "i").Contains(needle) ||
                haystack.Replace("e", "f").Contains(needle) ||
                haystack.Replace("f", "e").Contains(needle) ||
                haystack.Replace("r", "p").Contains(needle) ||
                haystack.Replace("p", "r").Contains(needle) ||
                haystack.Replace("s", "r").Contains(needle) ||
                haystack.Replace("r", "s").Contains(needle) ||
                haystack.Replace("r", "n").Contains(needle) ||
                haystack.Replace("n", "r").Contains(needle) ||
                haystack.Replace("k", "n").Contains(needle) ||
                haystack.Replace("n", "k").Contains(needle) ||
                haystack.Replace("h", "n").Contains(needle) ||
                haystack.Replace("n", "h").Contains(needle) ||
                haystack.Replace("k", "ll").Contains(needle) ||
                haystack.Replace("ll", "k").Contains(needle) ||
                haystack.Replace("ci", "d").Contains(needle) ||
                haystack.Replace("d", "ci").Contains(needle) ||
                haystack.Replace("cl", "d").Contains(needle) ||
                haystack.Replace("d", "cl").Contains(needle) ||
                haystack.Replace("m", "in").Contains(needle) ||
                haystack.Replace("in", "m").Contains(needle) ||
                haystack.Replace("rn", "m").Contains(needle) ||
                haystack.Replace("m", "rn").Contains(needle)
                )
        {
            ret = true;
        }
        return ret;
    }

1 个答案:

答案 0 :(得分:0)

haystack中的每个单词都会计算levenshtein distanceneedle。距离最短的词最有可能是你的针。请查看this question的实现。