我有两个数据帧,我需要在第一个数据帧中有条件地更新特定列。
df1 = pd.DataFrame([[1,'Foo',1,1,1,np.nan,np.nan,np.nan],[2,'Foo',2,2,2,np.nan,np.nan,np.nan],[3,'Bar',3,3,3,np.nan,np.nan,np.nan]], columns = ['Key','identifier','A','B','C','D','E','F'])
print df1
Key identifier A B C D E F
0 1 Foo 1 1 1 NaN NaN NaN
1 2 Foo 2 2 2 NaN NaN NaN
2 3 Bar 3 3 3 NaN NaN NaN
df2 = pd.DataFrame([[1,np.nan,10,10,10,5,6,7],[2,np.nan,12,12,12,8,9,10],[3,np.nan,13,13,13,11,12,13]], columns = ['Key','identifier','A','B','C','D','E','F'])
print df2
Key identifier A B C D E F
0 1 NaN 10 10 10 5 6 7
1 2 NaN 12 12 12 8 9 10
2 3 NaN 13 13 13 11 12 13
如果df1 ==' Foo'中的标识列,我需要使用df2中的相应列更新df1列D,E,F。我如何有条件地更新这三列?
df3 = #code here
期望的输出:
print df3
Key identifier A B C D E F
0 1 Foo 1 1 1 5.0 6.0 7.0
1 2 Foo 2 2 2 8.0 9.0 10.0
2 3 Bar 3 3 3 NaN NaN NaN
后续
相反,df1如下:
df1 = pd.DataFrame([[1,'Foo',1,1,1,np.nan,np.nan,np.nan],[4,'Bar',4,4,4,np.nan,np.nan,np.nan],[2,'Foo',2,2,2,np.nan,np.nan,np.nan],[3,'Bar',3,3,3,np.nan,np.nan,np.nan]], columns = ['Key','identifier','A','B','C','D','E','F'])
现在df1和df2的长度不相同,并且要更新的记录的位置不匹配。这怎么还在用?我得到以下输出:
df2[df1['identifier'] == 'Foo'].combine_first(df1)
Key identifier A B C D E F
0 1.0 Foo 10.0 10.0 10.0 5.0 6.0 7.0
1 4.0 Bar 4.0 4.0 4.0 NaN NaN NaN
2 3.0 Foo 13.0 13.0 13.0 11.0 12.0 13.0
3 3.0 Bar 3.0 3.0 3.0 NaN NaN NaN
答案 0 :(得分:2)
使用combine_first
将Key
设置为索引后,使用set_index
。
df1
identifier A B C D E F
Key
1 Foo 1 1 1 NaN NaN NaN
2 Foo 2 2 2 NaN NaN NaN
3 Bar 3 3 3 NaN NaN NaN
df2
identifier A B C D E F
Key
1 NaN 10 10 10 5 6 7
2 NaN 12 12 12 8 9 10
3 NaN 13 13 13 11 12 13
df2[df1.eval('identifier == "Foo"')].combine_first(df1)
identifier A B C D E F
Key
1 Foo 10.0 10.0 10.0 5.0 6.0 7.0
2 Foo 12.0 12.0 12.0 8.0 9.0 10.0
3 Bar 3.0 3.0 3.0 NaN NaN NaN