我正在做一个模糊的字符串匹配,尽管匹配以我希望的方式出现,但是我用来评估单词匹配的分数对于参考很重要,我似乎无法以便让我的函数返回具有各自分数的比赛。这是我用来测试功能,功能以及返回值的字符串列表。
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
data = ['HARRY LEHMAN', 'MICHAEL NELLIS','ALLIE CARTER','SCOTT
GOODSTEIN','ALLIE CARTER','KARIMA WILLIAMS','ALLIE CARTER','GARRY
REEDER','CHARLES COHEN']
data2 = ['HARRY LEHMAN', 'HANK LEHMAN','MICHAEL NELLIS','ALICE CARTER','ALLIE CARTER','HARRY LEMAN','ALLIE CARTER','MIKE NELLIS','A. CARTER','H. LEHMAN','CHARLES COHEN']
在此版本中,我不使用得分,但可以看到它运行良好:
def match_names(options, keys):
results = []
key = []
for i in keys:
for j in options:
match_score = fuzz.ratio(i, j)
if match_score > 75 and j not in results:
results += [j]
key += [i]
return pd.DataFrame(key,results)
输出:
match_names(data2, data)
HARRY LEHMAN HARRY LEHMAN
HANK LEHMAN HARRY LEHMAN
HARRY LEMAN HARRY LEHMAN
H. LEHMAN HARRY LEHMAN
MICHAEL NELLIS MICHAEL NELLIS
MIKE NELLIS MICHAEL NELLIS
ALICE CARTER ALLIE CARTER
ALLIE CARTER ALLIE CARTER
A. CARTER ALLIE CARTER
CHARLES COHEN CHARLES COHEN
但是当我尝试让函数返回分数(match_score)时,它就会崩溃。
def matchStatements(options, keys):
results = []
key = []
score = []
for i in keys:
for j in options:
match_score = fuzz.ratio(i, j)
if match_score > 75 and j not in results:
score += [match_score]
results += [j]
key += [i]
return pd.DataFrame(key,results,score)
当我要求函数打印分数时,它可以正常工作,但是我希望能够将分数与匹配项匹配,而不是与之相对应的列表。
任何帮助将不胜感激。