我正在尝试使用模糊匹配来捕捉验证集的响应列表。
我使用以下代码:
for x in rawDatabase.Status:
choice = process.extractOne(x, my_list)
print('choice ',choice)
rawDatabase
数据框中的状态列是我要验证的列。 my_list
是要隐藏的Status
列中条目的标准化值列表。
使用上面的代码我得到以下示例输出:
choice ('TRANSFER IN FROM GOVERNMENT DEPARTMENT', 100, 39)
choice ('TRANSFER OUT TO GOVERNMENT DEPARTMENT', 100, 40)
choice ('CURRENT', 100, 1)
choice ('LEAVER - RETIRED', 100, 12)
choice ('CURRENT', 100, 1)
有没有办法可以返回最适合被测试字符串的值,并使用更新后的值更新rawDatabase
状态列?所以例如我会被退回
choice = 'TRANSFER IN FROM GOVERNMENT DEPARTMENT'
choice = 'TRANSFER OUT TO GOVERNMENT DEPARTMENT'
choice = 'CURRENT'
choice = 'LEAVER - RETIRED'
choice = 'CURRENT'
答案 0 :(得分:1)
修改代码
l1=[]
for x in rawDatabase.Status:
choice = process.extractOne(x, my_list)[0]
l1.append(choice)
rawDatabase['choice']=l1
更多示例:
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
a=[]
for x in df.response:
a.append([process.extract(x, val.validate, limit=1)][0][0][0])
df['response2']=a
df
Out[867]:
id colour response response2
0 1 blue curent current
1 2 red loaning loan
2 3 yellow current current
3 4 green loan loan
4 5 red currret current
5 6 green loan loan
输入数据:
DF:
id colour response
1 blue curent
2 red loaning
3 yellow current
4 green loan
5 red currret
6 green loan
缬氨酸:
validate
current
loan
transfer