基于两列的模糊匹配率

时间:2019-06-13 03:47:21

标签: python fuzzywuzzy

我一直在使用此代码和Fuzzywuzzy进行字符串匹配,但是在基于两个不同列(名称和地址)的输出中遇到一些问题

我有以下两个CSV文件:

数据库:

ID_database  name   Address 
1            Jean   1123 street
2            Paul   145 street
3            Bob    142 street

客户端:

ID_client    CUSTOMER_NAME     Address 
1            Jen               1123 st.
2            Paul.P             145 st.

到目前为止,这是我的代码,它返回匹配的地址和字段地址之间的比率匹配。

 from fuzzywuzzy import process
    import pandas as pd


    names_array=[]
    ratio_array=[]
    def match_names(name_universe,clients_name):
        for row in name_universe:
            x=process.extractOne(row, clients_name)
            names_array.append(x[0])
            ratio_array.append(x[1])
        return names_array,ratio_array


    df=pd.read_csv("Master.csv", encoding="ISO-8859-1")
    name_universe=df['CUSTOMER NAME'].dropna().values

    clients_df=pd.read_csv("client.csv", encoding="ISO-8859-1")
    clients_name=clients_df['name'].values
    id_code=clients_df['Customer_Code'].values

    name_match,ratio_match=match_names(name_universe,clients_name)


    df['match_universe_name']=pd.Series(name_match)
    df['names_ratio']=pd.Series(ratio_match)

    df.to_csv("database.csv")

    print(df[['CUSTOMER_NAME','match_universe_name','names_ratio']].head(10))

我想比较名称和地址之间的匹配率,以便最终输出的excel文件看起来像这样:

    ID_database  Name   Address       ID_client    Name     Address  RatioName Ratioaddress 
    1            Jean   1123 street   1            Jen     1123 st.  0.77       0.80   
    2            Paul   145 street    2            Paul.P   145 st.  0.97       0.99

    3            Bob    142 street

如何修改以获取该输出?

0 个答案:

没有答案