Question

我有一个熊猫数据框，其中的两列包含字符串，如下所示：

Col-1                 Col-2
Animal                have an apple
Fruit                 tiger safari
Veg                   Vegetable Market
Flower                Garden

由此，我必须创建一个将字符串作为参数的函数。

然后，此函数检查输入字符串与fuzziwuzzy的元素之间的Col-2相似度，并输出与计算出的相似度最高的Col-1和Col-2对应的元素。

例如，假设输入字符串为Gardening Hobby，这里它将检查与df['Col-2']的所有元素的相似性。该函数发现Garden与Gardening Hobby的相似度最高，得分为90。然后，预期输出为：

I/P               O/P
Gardening Hobby   Garden(60),Flower

Answer 1

使用fuzzywuzzy库-tutorial

尝试以下方法

from fuzzywuzzy import process

search_str = 'Gardening Hobby'
# extract the best match of search_str in df['Col-2']
best_match = process.extractOne(search_str, df['Col-2'])
print(best_match)  # output: ('Garden', 90, 3)  (match,score,index)

# get results for 'Col-1' using the index
res = df.iloc[best_match[2]]['Col-1']
print(res)  # output: 'Flower'

# construct the output string as you wish
'%s(%d), %s' % (best_match[0], best_match[1], res)

# output: 'Garden(90), Flower'

查找字符串输入和数据框的字符串列之间的相似性

1 个答案: