Pandas:根据另一个数据框中的值更新数据框中的多列

时间:2019-02-15 21:58:26

标签: python pandas dataframe updating

我有两个不同尺寸的数据框。 仅当df2df1的列值[UserId,Month]匹配时,才需要从df2更新df1中的msg_count

我的数据如下:

df1:
UserID  Month           A       B       C       D       E       F  msg_count

knaas    1/1/2017       0       0       0       0       0       0    0  
knaas    2/1/2017       0       0       0       0       0       0    0
knaas    3/1/2017       0       0       0       0       0       0    0
knaas    4/1/2017       0       0       0       2       0       0    0
knaas    5/1/2017       0       0       0       0       0       0    0
knaas    6/1/2017       0       0       0       0       0       0    0
knaas    7/1/2017       0       0       0       0       0       0    0
knaas    8/1/2017       0       0       0       0       0       0    0
knaas    9/1/2017       0       0       0       0       0       0    0
knaas    10/1/2017      0       0       0       0       0       0    0
knaas    11/1/2017      0       0       0       0       0       0    0
knaas    12/1/2017      0       0       0       0       0       0    0
ArtCort0324 1/1/2017    0       0       0       0       0       0    0 
ArtCort0324 2/1/2017    0       2       0       2       0       0    0 
ArtCort0324 3/1/2017    0       0       0       0       0       0    0 
ArtCort0324 4/1/2017    0       1       1       0       0       0    0
ArtCort0324 5/1/2017    0       0       0       3       0       0    0
ArtCort0324 6/1/2017    0       0       0       0       0       0    9 

df2:
  UserID           Month    msg_count       
  ArtCort0324   1/1/2017    0    
  ArtCort0324   2/1/2017    0    
  ArtCort0324   3/1/2017    0    
  ArtCort0324   4/1/2017    0    
  ArtCort0324   5/1/2017    0    
  ArtCort0324   6/1/2017    9    
  ArtCort0324   7/1/2017    0    
  ArtCort0324   8/1/2017    0    
  ArtCort0324   9/1/2017    0    
  ArtCort0324   10/1/2017   0     
  ArtCort0324   11/1/2017   0    
  ArtCort0324   12/1/2017   0     

我尝试了以下代码片段。但是它没有按预期工作

res = df2.set_index(['UserID', 'Month'])\
     .combine_first(df1.set_index(['UserID', 'Month']))\
     .reset_index()

updated_new = df1.merge(gitter, how='left', on=['UserID', 'Month'], 
suffixes=('', '_new'))
 updated_new['msg_count'] = 
 np.where(pd.notnull(updated_new['msg_count_new']), 
 updated_new['msg_count_new'], updated_new['msg_count'])

我需要以下输出

UserID  Month           A       B       C       D       E       F  msg_count

knaas   1/1/2017        0       0       0       0       0       0     0    
knaas   2/1/2017        0       0       0       0       0       0     0    
knaas   3/1/2017        0       0       0       0       0       0     0    
knaas   4/1/2017        0       0       0       2       0       0     0    
knaas   5/1/2017        0       0       0       0       0       0     0    
knaas   6/1/2017        0       0       0       0       0       0     0    
knaas   7/1/2017        0       0       0       0       0       0     0    
knaas   8/1/2017        0       0       0       0       0       0     0    
knaas   9/1/2017        0       0       0       0       0       0     0     
knaas   10/1/2017       0       0       0       0       0       0     0    
knaas   11/1/2017       0       0       0       0       0       0     0    
knaas   12/1/2017       0       0       0       0       0       0     0    
ArtCort0324  1/1/2017   0       0       0       0       0       0     0    
ArtCort0324  2/1/2017   1       0       0       0       0       0     0    
ArtCort0324  3/1/2017   0       0       0       0       0       0     50    
ArtCort0324  4/1/2017   0       0       0       0       0       0     0   

我已向msg_count添加了默认列df1,其默认值为0。 仅当两个数据帧中的msg_countdf1相等时,我才需要用msg_count中的df2的值来更新UserId中的Month / p>

1 个答案:

答案 0 :(得分:0)

听起来您想要merge

df_merge = pd.merge(left=df1, right=df2, on=['UserID', 'Month'], how='left']

您可能希望将其设置为'inner', 'outer'等...