如何使用Python中另一个DataFrame对象中的值更新DataFrame对象的一部分?

时间:2016-03-07 13:20:01

标签: python pandas dataframe

我有两个DataFrame对象:

OBJ1:

header1    header2    header3     header4
1          A          someValue1  someValue5
2          B          someValue2  someValue6
3          C          someValue3  someValue7
4          D          someValue4  someValue8

OBJ2:

header1    header2    header3      header4
1          E          someValue9   someValue13
2          F          someValue10  someValue14
3          G          someValue11  someValue15
4          H          someValue10  someValue16

我想更新obj1保留列header1header2中的值,并将列header3header4设置为值obj2

例如:

header1    header2    header3      header4
1          A          someValue9   someValue13
2          B          someValue10  someValue14
3          C          someValue11  someValue15
4          D          someValue10  someValue16

我尝试的是:

for ID in obj2.header2:
    obj1[obj1.header1==ID].header3 = obj2[obj2.header1==ID].header3
    obj1[obj1.header1==ID].header4 = obj2[obj2.header1==ID].header4

但是,这不会改变obj1中的任何内容,它仍然与上面的代码相同。

有没有很好的方法来实现我的目标?

请注意,这些示例是抽象的,真实ID(AKA header1)在obj1obj2中可能不是1对1匹配。因此,某些ID不需要更新。例如,obj1的ID为1,2,3,4,5,obj2的ID为2,3,4,5。因此,obj1中的ID 1不必更新。

非常感谢。

1 个答案:

答案 0 :(得分:1)

您可以使用mergecombine_first

print obj1
   ID header2      header3      header4
0   1       A   someValue1   someValue5
1   2       B   someValue2   someValue6
2   3       C   someValue3   someValue7
3   4       D   someValue4   someValue8
4   5      D1  someValue41  someValue81

print obj2
   ID header2      header3      header4
0   2       E   someValue9  someValue13
1   3       F  someValue10  someValue14
2   4       G  someValue11  someValue15
3   5       H  someValue10  someValue16



df = pd.merge(obj1, obj2, on=['ID'], suffixes=['_l', ''], how='left').combine_first(obj1)
print df
   ID header2 header2_l      header3    header3_l      header4    header4_l
0   1       A         A   someValue1   someValue1   someValue5   someValue5
1   2       E         B   someValue9   someValue2  someValue13   someValue6
2   3       F         C  someValue10   someValue3  someValue14   someValue7
3   4       G         D  someValue11   someValue4  someValue15   someValue8
4   5       H        D1  someValue10  someValue41  someValue16  someValue81

df = df[['ID','header2','header3','header4']]
print df
   ID header2      header3      header4
0   1       A   someValue1   someValue5
1   2       E   someValue9  someValue13
2   3       F  someValue10  someValue14
3   4       G  someValue11  someValue15
4   5       H  someValue10  someValue16

isinlocvalues的解决方案:

mask = obj1.ID.isin(obj2.ID.tolist())
print mask
0    False
1     True
2     True
3     True
4     True
Name: ID, dtype: bool

obj1.loc[mask, obj1.columns] = obj2.values
print obj1
   ID header2      header3      header4
0   1       A   someValue1   someValue5
1   2       E   someValue9  someValue13
2   3       F  someValue10  someValue14
3   4       G  someValue11  someValue15
4   5       H  someValue10  someValue16