Question

我有两个数据框：Instructor_Info和Operator_Info

Instructor_Info包含一个名为Names和OperatorName的列，而Operator_Info也有一个名为Names的列。 Instructor_Info中的所有名称在操作员信息中都有一个关联的名称。我想使用fuzz.token_sort_ratio（）通过将Instructor_Info中的每个名称与Operator_Info中的每个名称进行比较，并将得分最高的字符串存储在OperatorName列中，来找到这些匹配项。

这是我到目前为止所拥有的：

for index, row in Instructor_Info.iterrows():
    match = 0
    for index1,row1 in Operator_Info.iterrows():
        if fuzz.token_sort_ratio(row['Names'],row1['Names']) > match:
            row['OperatorName'] = row1['Names']

这段代码运行非常慢，并且得到了一些错误的匹配（我可以手动修复这些错误，因此速度是主要问题）。如果有人有任何更快的想法，将不胜感激。预先感谢。

使用FuzzyWuzzy匹配大熊猫数据框中的字符串

0 个答案: