我有两个数据框。
df1
:
Name Symbol ID
0 Jay N/A 372Y105
1 Ray N/A 4446100
2 Faye N/A 484MAA4
3 Maye N/A 504W308
4 Kay N/A 782L107
5 Trey FFF 782L111
df2
:
Name Symbol ID
0 Jay AAA 372Y105
1 Faye CCC 484MAA4
2 Kay EEE 782L107
如果ID
和df1
之间的匹配df2
,我想用{{1}中的symbol
替换df1
中的symbol
}},因此输出如下所示:
df2
听起来我应该首先连接两个数据帧,然后以某种方式删除重复项,例如
Name Symbol ID
0 Jay AAA 372Y105
1 Ray N/A 4446100
2 Faye CCC 484MAA4
3 Maye N/A 504W308
4 Kay EEE 782L107
5 Trey FFF 782L111
除了只保留第一个或最后一个重复项外,我还想删除df3 = pd.concat([df1, df2])
df3 = df3.drop_duplicates(subset='ID', keep='last')
= symbol
处的那些重复项。
答案 0 :(得分:1)
首先将merge
与左连接一起使用,然后将Symbol
列中的缺失值替换为Symbol_
列:
print (df1.merge(df2, on=['Name','ID'], how='left', suffixes=('', '_')))
Name Symbol ID Symbol_
0 Jay NaN 372Y105 AAA
1 Ray NaN 4446100 NaN
2 Faye NaN 484MAA4 CCC
3 Maye NaN 504W308 NaN
4 Kay NaN 782L107 EEE
5 Trey FFF 782L111 NaN
df = (df1.merge(df2, on=['Name','ID'], how='left', suffixes=('', '_'))
.assign(Symbol = lambda x: x['Symbol'].fillna(x.pop('Symbol_'))))
print (df)
Name Symbol ID
0 Jay AAA 372Y105
1 Ray NaN 4446100
2 Faye CCC 484MAA4
3 Maye NaN 504W308
4 Kay EEE 782L107
5 Trey FFF 782L111
使用DataFrame.update
的另一种解决方案:
df1 = df1.set_index(['Name','ID'])
df2 = df2.set_index(['Name','ID'])
df1.update(df2)
df1 = df1.reset_index()
print (df1)
Name ID Symbol
0 Jay 372Y105 AAA
1 Ray 4446100 NaN
2 Faye 484MAA4 CCC
3 Maye 504W308 NaN
4 Kay 782L107 EEE
5 Trey 782L111 FFF