我一直在使用此代码和Fuzzywuzzy进行字符串匹配,但是在基于两个不同列(名称和地址)的输出中遇到一些问题
我有以下两个CSV文件:
数据库:
ID_database name Address
1 Jean 1123 street
2 Paul 145 street
3 Bob 142 street
客户端:
ID_client CUSTOMER_NAME Address
1 Jen 1123 st.
2 Paul.P 145 st.
到目前为止,这是我的代码,它返回匹配的地址和字段地址之间的比率匹配。
from fuzzywuzzy import process
import pandas as pd
names_array=[]
ratio_array=[]
def match_names(name_universe,clients_name):
for row in name_universe:
x=process.extractOne(row, clients_name)
names_array.append(x[0])
ratio_array.append(x[1])
return names_array,ratio_array
df=pd.read_csv("Master.csv", encoding="ISO-8859-1")
name_universe=df['CUSTOMER NAME'].dropna().values
clients_df=pd.read_csv("client.csv", encoding="ISO-8859-1")
clients_name=clients_df['name'].values
id_code=clients_df['Customer_Code'].values
name_match,ratio_match=match_names(name_universe,clients_name)
df['match_universe_name']=pd.Series(name_match)
df['names_ratio']=pd.Series(ratio_match)
df.to_csv("database.csv")
print(df[['CUSTOMER_NAME','match_universe_name','names_ratio']].head(10))
我想比较名称和地址之间的匹配率,以便最终输出的excel文件看起来像这样:
ID_database Name Address ID_client Name Address RatioName Ratioaddress
1 Jean 1123 street 1 Jen 1123 st. 0.77 0.80
2 Paul 145 street 2 Paul.P 145 st. 0.97 0.99
3 Bob 142 street
如何修改以获取该输出?