我有两个DataFrame对象:
OBJ1:
header1 header2 header3 header4
1 A someValue1 someValue5
2 B someValue2 someValue6
3 C someValue3 someValue7
4 D someValue4 someValue8
OBJ2:
header1 header2 header3 header4
1 E someValue9 someValue13
2 F someValue10 someValue14
3 G someValue11 someValue15
4 H someValue10 someValue16
我想更新obj1
保留列header1
和header2
中的值,并将列header3
和header4
设置为值obj2
。
例如:
header1 header2 header3 header4
1 A someValue9 someValue13
2 B someValue10 someValue14
3 C someValue11 someValue15
4 D someValue10 someValue16
我尝试的是:
for ID in obj2.header2:
obj1[obj1.header1==ID].header3 = obj2[obj2.header1==ID].header3
obj1[obj1.header1==ID].header4 = obj2[obj2.header1==ID].header4
但是,这不会改变obj1
中的任何内容,它仍然与上面的代码相同。
有没有很好的方法来实现我的目标?
请注意,这些示例是抽象的,真实ID
(AKA header1
)在obj1
和obj2
中可能不是1对1匹配。因此,某些ID不需要更新。例如,obj1
的ID为1,2,3,4,5,obj2
的ID为2,3,4,5。因此,obj1
中的ID 1不必更新。
非常感谢。
答案 0 :(得分:1)
您可以使用merge
和combine_first
:
print obj1
ID header2 header3 header4
0 1 A someValue1 someValue5
1 2 B someValue2 someValue6
2 3 C someValue3 someValue7
3 4 D someValue4 someValue8
4 5 D1 someValue41 someValue81
print obj2
ID header2 header3 header4
0 2 E someValue9 someValue13
1 3 F someValue10 someValue14
2 4 G someValue11 someValue15
3 5 H someValue10 someValue16
df = pd.merge(obj1, obj2, on=['ID'], suffixes=['_l', ''], how='left').combine_first(obj1)
print df
ID header2 header2_l header3 header3_l header4 header4_l
0 1 A A someValue1 someValue1 someValue5 someValue5
1 2 E B someValue9 someValue2 someValue13 someValue6
2 3 F C someValue10 someValue3 someValue14 someValue7
3 4 G D someValue11 someValue4 someValue15 someValue8
4 5 H D1 someValue10 someValue41 someValue16 someValue81
df = df[['ID','header2','header3','header4']]
print df
ID header2 header3 header4
0 1 A someValue1 someValue5
1 2 E someValue9 someValue13
2 3 F someValue10 someValue14
3 4 G someValue11 someValue15
4 5 H someValue10 someValue16
mask = obj1.ID.isin(obj2.ID.tolist())
print mask
0 False
1 True
2 True
3 True
4 True
Name: ID, dtype: bool
obj1.loc[mask, obj1.columns] = obj2.values
print obj1
ID header2 header3 header4
0 1 A someValue1 someValue5
1 2 E someValue9 someValue13
2 3 F someValue10 someValue14
3 4 G someValue11 someValue15
4 5 H someValue10 someValue16