用模糊方法合并数据集时出错

时间:2018-07-12 05:10:06

标签: python pandas merge matching fuzzy

我是python的新手,真的很难找到一种简单的模糊匹配方法来合并两个数据帧。

我有两个共有一列的数据集-观测名称(在本例中为工厂)。为了简化起见,对于每个数据框,我只保留具有名称的列,因此,在这里,我尝试合并两列:name_tracker和name_fin。这些列是对象。

本文https://medium.com/@rtjeannier/combining-data-sets-with-fuzzy-matching-17efcb510ab2之后,我尝试了以下方法,但收到以下错误:

from fuzzywuzzy import fuzz

def match_name(name, list_names, min_score=0):
    # -1 score incase we don't get any matches
    max_score = -1
    # Returning empty name for no match as well
    max_name = ""
    # Iternating over all names in the other
    for name2 in list_names:
        #Finding fuzzy match score
        score = fuzz.ratio(name, name2)
        # Checking if we are above our threshold and have a better score
        if (score > min_score) & (score > max_score):
            max_name = name2
            max_score = score
    return (max_name, max_score)


# List for dicts for easy dataframe creation
dict_list = []

table.reset_index(inplace = True)


for name in name_tracker:
    # Use our method to find best match, we can set a threshold here
    match = match_name(name, name_fin, 75)

    # New dict for storing data
    dict_ = {}
    dict_.update({"factory_name" : name})
    dict_.update({"factory_name" : match[0]})
    dict_.update({"score" : match[1]})
    dict_list.append(dict_)

merge_table = pd.DataFrame(dict_list)
# Display results
merge_table
if len(args[0]) == 0 or len(args[1]) == 0:
     

TypeError:类型为'int'的对象没有len()

这是什么错误?有没有更简单的方法来运行Fuzzymatch来合并数据集?

0 个答案:

没有答案