模糊匹配 - 返回测试字符串的最佳潜在值

时间:2017-10-18 15:08:07

标签: python pandas

我正在尝试使用模糊匹配来捕捉验证集的响应列表。

我使用以下代码:

for x in rawDatabase.Status:
        choice = process.extractOne(x, my_list)
        print('choice ',choice)

rawDatabase数据框中的状态列是我要验证的列。 my_list是要隐藏的Status列中条目的标准化值列表。

使用上面的代码我得到以下示例输出:

choice  ('TRANSFER IN FROM GOVERNMENT DEPARTMENT', 100, 39)
choice  ('TRANSFER OUT TO GOVERNMENT DEPARTMENT', 100, 40)
choice  ('CURRENT', 100, 1)
choice  ('LEAVER - RETIRED', 100, 12)
choice  ('CURRENT', 100, 1)

有没有办法可以返回最适合被测试字符串的值,并使用更新后的值更新rawDatabase状态列?所以例如我会被退回

choice = 'TRANSFER IN FROM GOVERNMENT DEPARTMENT'
choice = 'TRANSFER OUT TO GOVERNMENT DEPARTMENT'
choice = 'CURRENT'
choice = 'LEAVER - RETIRED'
choice = 'CURRENT'

1 个答案:

答案 0 :(得分:1)

修改代码

l1=[]
for x in rawDatabase.Status:
        choice = process.extractOne(x, my_list)[0]
        l1.append(choice)
rawDatabase['choice']=l1

更多示例:

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
a=[]
for x in df.response:
    a.append([process.extract(x, val.validate, limit=1)][0][0][0])
df['response2']=a
df

Out[867]: 
   id  colour response response2
0   1    blue   curent   current
1   2     red  loaning      loan
2   3  yellow  current   current
3   4   green     loan      loan
4   5     red  currret   current
5   6   green     loan      loan

输入数据:

DF:

id colour  response
 1   blue    curent 
 2    red   loaning
 3 yellow   current
 4  green      loan 
 5    red   currret
 6  green      loan

缬氨酸:

validate
 current
    loan
transfer