我想基于两个数据帧中写入的键institution
合并两个数据集。数据帧如下所示:
df1 = pd.DataFrame({'institution':['Havard University', 'Oxford University', 'University of Kent', 'Harvard University']})
df2 = pd.DataFrame({'institution':['Havard University', 'University of Oxford', 'Kent University'], 'ranking': ['very good', 'very good', 'good']})
我该怎么做?
答案 0 :(得分:0)
您可以使用模糊合并。
您可以在下面的链接中检查不同类型的方法。 fuzzy_pandas, different types of string similarity algorithms
import fuzzy_pandas as fpd
merged_df=fpd.fuzzy_merge(df1, df2, left_on=['name'],right_on=['name'],method='jaro', ignore_case=True,threshold=0.9)
答案 1 :(得分:0)
您可以创建映射功能:
mapping = {'Havard University':'Harvard University','Oxford University':'University of Oxford','University of Kent':'Kent University','Harvard University':'Harvard University'}
df1['institution'] = df1.institution.map(mapping)
df1.merge(df2,on='institution',how='left')
institution ranking
0 Harvard University NaN
1 University of Oxford very good
2 Kent University good
3 Harvard University NaN