Question

我有以下数据框：

df = pd.DataFrame(
    {'id': [1, 2, 3, 4, 5, 6], 
     'fruits': ['apple', 'apples', 'orange', 'apple tree', 'oranges', 'mango']
    })
   id      fruits
0   1       apple
1   2      apples
2   3      orange
3   4  apple tree
4   5     oranges
5   6       mango

我希望在列fruits中找到模糊字符串，并按如下方式获得一个新的数据帧，该数据帧的ratio_score高于80。

在Python中如何使用Fuzzywuzzy软件包做到这一点？谢谢。请注意，ratio_score是一系列构成示例的值。

我的解决方案：

df.loc[:,'fruits_copy'] = df['fruits']
df['ratio_score'] = df[['fruits', 'fruits_copy']].apply(lambda row: fuzz.ratio(row['fruits'], row['fruits_copy']), axis=1)

预期结果：

     id      fruits    matched_id     matched_fruits   ratio_score   
0     1       apple        2                apples           95
1     1       apple        4            apple tree           85     
2     2      apples        4            apple tree           80   
3     3      orange        5               oranges           95     
4     6       mango

与参考相关：

Fuzzy matching a sorted column with itself using python

Apply fuzzy matching across a dataframe column and save results in a new column

How do I fuzzy match items in a column of an array in python?

Using fuzzywuzzy to create a column of matched results in the data frame

Answer 1

我的解决方案，其引用如下：Apply fuzzy matching across a dataframe column and save results in a new column

df.loc[:,'fruits_copy'] = df['fruits']

compare = pd.MultiIndex.from_product([df['fruits'],
                                      df['fruits_copy']]).to_series()

def metrics(tup):
    return pd.Series([fuzz.ratio(*tup),
                      fuzz.token_sort_ratio(*tup)],
                     ['ratio', 'token'])

compare.apply(metrics)

                       ratio  token
apple      apple         100    100
           apples         91     91
           orange         36     36
           apple tree     67     67
           oranges        33     33
           mango          20     20
apples     apple          91     91
           apples        100    100
           orange         33     33
           apple tree     62     62
           oranges        46     46
           mango          18     18
orange     apple          36     36
           apples         33     33
           orange        100    100
           apple tree     25     25
           oranges        92     92
           mango          55     55
apple tree apple          67     67
           apples         62     62
           orange         25     25
           apple tree    100    100
           oranges        24     24
           mango          13     13
oranges    apple          33     33
           apples         46     46
           orange         92     92
           apple tree     24     24
           oranges       100    100
           mango          50     50
mango      apple          20     20
           apples         18     18
           orange         55     55
           apple tree     13     13
           oranges        50     50
           mango         100    100

在一列中模糊匹配字符串，并使用Fuzzywuzzy创建新的数据框

1 个答案: