我有两个数据帧,如下所示:我想将df1 ['Data1']中的值映射到df2 ['Data1']和df2 ['Data2']。我用下面的方法,但它更长。熊猫还有其他替代方法吗
df1 = pd.read_excel("df1.xlsx")
df2 = pd.read_excel("df2.xlsx"
df1
Data1 Data2 Score
ABC AB1 1
AB1 ABC 4
AB2 AB2 6
ABC ABD 0.7
GDH ABD 0.9
KMN KSF 0.5
KSF KSF 6
df2
Data1
AB1
AB2
ABC
ABD
mapped=pd.merge(df1, df2, left_on='Data1', right_on='Data1')
mappedx = pd.merge(df1, df2, left_on='Data2', right_on='Data1')
mappedx.rename(columns = {'Data1_x':'Data1'}, inplace = True)
mappedx = mappedx[['Data1','Data2','Score']]
frame = [mapped, mappedx]
result = pd.concat(frame)
result = result.drop_duplicates()
result
Data1 Data2 Score
ABC AB1 1
AB1 ABC 4
AB2 AB2 6
ABC ABD 0.7
GDH ABD 0.9
答案 0 :(得分:2)
对于由|
链接的两列,按位OR
使用Series.isin
:
df = df1[df1['Data1'].isin(df2['Data1']) | df1['Data2'].isin(df2['Data1'])]
print (df)
Data1 Data2 Score
0 ABC AB1 1.0
1 AB1 ABC 4.0
2 AB2 AB2 6.0
3 ABC ABD 0.7
4 GDH ABD 0.9
或将DataFrame.isin
与DataFrame.any
一起使用:
df = df1[df1[['Data1','Data2']].isin(df2['Data1'].tolist()).any(axis=1)]
print (df)
Data1 Data2 Score
0 ABC AB1 1.0
1 AB1 ABC 4.0
2 AB2 AB2 6.0
3 ABC ABD 0.7
4 GDH ABD 0.9