如何在略有不同的键上合并两个熊猫数据框

时间:2020-11-11 12:32:41

标签: python pandas merge

我想基于两个数据帧中写入的键institution合并两个数据集。数据帧如下所示:

df1 = pd.DataFrame({'institution':['Havard University', 'Oxford University', 'University of Kent', 'Harvard University']})
df2 = pd.DataFrame({'institution':['Havard University', 'University of Oxford', 'Kent University'], 'ranking': ['very good', 'very good', 'good']})

我该怎么做?

2 个答案:

答案 0 :(得分:0)

您可以使用模糊合并。

您可以在下面的链接中检查不同类型的方法。 fuzzy_pandasdifferent types of string similarity algorithms

import fuzzy_pandas as fpd
merged_df=fpd.fuzzy_merge(df1, df2, left_on=['name'],right_on=['name'],method='jaro', ignore_case=True,threshold=0.9)

答案 1 :(得分:0)

您可以创建映射功能:

mapping = {'Havard University':'Harvard University','Oxford University':'University of Oxford','University of Kent':'Kent University','Harvard University':'Harvard University'}

df1['institution'] = df1.institution.map(mapping)
df1.merge(df2,on='institution',how='left')

输出:

            institution    ranking
0    Harvard University        NaN
1  University of Oxford  very good
2       Kent University       good
3    Harvard University        NaN