python如何在字符串中动态查找人名

时间:2019-03-11 13:55:03

标签: python regex

我正在一个项目中工作,在该项目中我必须使用语音文本输入作为确定呼叫对象的输入,但是使用语音文本可能会产生一些意外的结果,所以我想对字符串进行一些动态匹配,我我从小处开始,尝试匹配一个名字,我的名字叫尼克·韦斯,我尝试将我的名字与语音文本匹配,但是我也希望它能匹配,例如某些文本是Nik或其他东西,理想情况下,我如果只有1个字母是错误的,我们希望拥有可以与所有内容匹配的东西,因此

尼克 ick 尼克 尼克 nck

全部都匹配我的名字,我当前拥有的简单代码是:

  def user_to_call(s):
  if "NICK" or "NIK" in s.upper(): redirect = "Nick"
  if redirect: return redirect

对于4个字母的名称,可以将所有可能性都放入过滤器中,但是对于12个字母的名称,则有点过大,因为我敢肯定这样做可以更有效率。

3 个答案:

答案 0 :(得分:1)

您需要使用Levenshtein_distance

python实现是nltk

import nltk
nltk.edit_distance("humpty", "dumpty")

答案 1 :(得分:0)

据我了解,您没有看到任何模糊匹配。 (因为您未批准其他回复)。 如果您只是想评估您在请求中指定的内容,则代码如下。我在打印适当的消息时附加了一些其他条件。随时删除它们。

def wordmatch(baseword, wordtoMatch, lengthOfMatch):
    lis_of_baseword = list(baseword.lower())
    lis_of_wordtoMatch = list(wordtoMatch.lower()) 
    sum = 0
    for index_i, i in enumerate(lis_of_wordtoMatch):
        for index_j, j in enumerate(lis_of_baseword):
            if i in lis_of_baseword:
                if i == j and index_i <= index_j:
                    sum = sum + 1
                    break
                else:
                    pass
            else:
                print("word to match has characters which are not in baseword")
                return 0
    if sum >= lengthOfMatch and len(wordtoMatch) <= len(baseword):
        return 1
    elif sum >= lengthOfMatch and len(wordtoMatch) > len(baseword):
        print("word to match has no of characters more than that of baseword")
        return 0
    else:
        return 0

base = "Nick"
tomatch = ["Nick", "ick", "nik", "nic", "nck", "nickey","njick","nickk","nickn"]
wordlength_match = 3 # this says how many words to match in the base word. In your case, its 3

for t_word in tomatch:
    print(wordmatch(base,t_word,wordlength_match))

输出看起来像这样

1
1
1
1
1
word to match has characters which are not in baseword
0
word to match has characters which are not in baseword
0
word to match has no of characters more than that of baseword
0
word to match has no of characters more than that of baseword
0

让我知道这是否达到您的目的。

答案 2 :(得分:0)

您基本上需要的是模糊字符串匹配,请参阅:

https://en.wikipedia.org/wiki/Approximate_string_matching

https://www.datacamp.com/community/tutorials/fuzzy-string-python

基于此,您可以检查输入内容与字典的相似程度:

 from fuzzywuzzy import fuzz

 name = "nick"
 tomatch = ["Nick", "ick", "nik", "nic", "nck", "nickey", "njick", "nickk", "nickn"]
 for str in tomatch:
    ratio = fuzz.ratio(str.lower(), name.lower())
    print(ratio)

此代码将产生以下输出:

100
86
86
86
86
80
89
89
89

您必须尝试不同的比率并进行检查,以符合只丢失一个字母的要求