ValueError:传递的值的形状不同于索引所暗示的-模糊字符串匹配

时间:2018-08-02 21:08:56

标签: fuzzywuzzy

我正在做一个模糊的字符串匹配,尽管匹配以我希望的方式出现,但是我用来评估单词匹配的分数对于参考很重要,我似乎无法以便让我的函数返回具有各自分数的比赛。这是我用来测试功能,功能以及返回值的字符串列表。

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

data = ['HARRY LEHMAN', 'MICHAEL NELLIS','ALLIE CARTER','SCOTT 
GOODSTEIN','ALLIE CARTER','KARIMA WILLIAMS','ALLIE CARTER','GARRY 
REEDER','CHARLES COHEN']
data2 = ['HARRY LEHMAN', 'HANK LEHMAN','MICHAEL NELLIS','ALICE CARTER','ALLIE CARTER','HARRY LEMAN','ALLIE CARTER','MIKE NELLIS','A. CARTER','H. LEHMAN','CHARLES COHEN']

在此版本中,我不使用得分,但可以看到它运行良好:

def match_names(options, keys):
results = []
key = []
for i in keys:
    for j in options:
        match_score = fuzz.ratio(i, j)
        if match_score > 75 and j not in results:
            results += [j]
            key += [i]
return pd.DataFrame(key,results)

输出:

match_names(data2, data)

HARRY LEHMAN    HARRY LEHMAN
HANK LEHMAN HARRY LEHMAN
HARRY LEMAN HARRY LEHMAN
H. LEHMAN   HARRY LEHMAN
MICHAEL NELLIS  MICHAEL NELLIS
MIKE NELLIS MICHAEL NELLIS
ALICE CARTER    ALLIE CARTER
ALLIE CARTER    ALLIE CARTER
A. CARTER   ALLIE CARTER
CHARLES COHEN   CHARLES COHEN

但是当我尝试让函数返回分数(match_score)时,它就会崩溃。

def matchStatements(options, keys):
results = []
key = []
score = []
for i in keys:
    for j in options:
        match_score = fuzz.ratio(i, j)
        if match_score > 75 and j not in results:
            score += [match_score]
            results += [j]
            key += [i]
return pd.DataFrame(key,results,score)

当我要求函数打印分数时,它可以正常工作,但是我希望能够将分数与匹配项匹配,而不是与之相对应的列表。

任何帮助将不胜感激。

0 个答案:

没有答案