Question

我有2个数据帧：df1和df2。 df1有列[＆＃39; UserId＆＃39;，＆＃39; company＆＃39;，＆＃39; deg＆＃39;]并且有100个观察值。 df2有列[＆＃39; UserId＆＃39;，＆＃39; deg＆＃39;]并且有10个观察值。 df1和df2中的索引完全匹配 - ＆＃39; userId＆＃39;。

我想用df2中的df1更新df1。＆＃39; UserId＆＃39; df2中的列是＆＃39; UserId＆＃39;的一个子集。 df1中的列....所以，没有什么要追加的。仅基于＆＃39; userId＆＃39; （和/或普通索引）。

DF1

,'UserId','Company','deg'
6,'john21','ibm','bs'
12,'mary33','cisco','ms'
16,'smith11','intel','none'
20,'lucy55','intel','bs'
33,'tanya32','fb','ms'
39,'ssri44','google','none'
45,'har43','CDs','none'

DF2

,'UserId','deg'
16,'smith11','BS'
39,'ssri44','MS'
45,'har43','MS'

现在，我想使用df2中的信息来更新df1。如您所见，索引值和userId与df1中的索引值完全匹配。

有什么建议吗？

谢谢！

Answer 1

您可以先replace None改为NaN，然后使用fillna更新None中的df1值df2 ：

df1.replace({"'none'": np.nan}, inplace=True)
#or omit '', for me works uncomment version
#df1.replace({"none": np.nan}, inplace=True)

print df1.fillna(df2)

     'UserId' 'Company' 'deg'
6    'john21'     'ibm'  'bs'
12   'mary33'   'cisco'  'ms'
16  'smith11'   'intel'  'BS'
20   'lucy55'   'intel'  'bs'
33  'tanya32'      'fb'  'ms'
39   'ssri44'  'google'  'MS'
45    'har43'     'CDs'  'MS'

update的另一个解决方案：

df1.replace({"'none'": np.nan}, inplace=True)

df1.update(df2)
print df1
     'UserId' 'Company' 'deg'
6    'john21'     'ibm'  'bs'
12   'mary33'   'cisco'  'ms'
16  'smith11'   'intel'  'BS'
20   'lucy55'   'intel'  'bs'
33  'tanya32'      'fb'  'ms'
39   'ssri44'  'google'  'MS'
45    'har43'     'CDs'  'MS'

如果您希望df2使用merge更新df1：

print pd.merge(df2,df1,left_index=True,right_index=True,how='left', on=["'UserId'","'deg'"])
     'UserId' 'deg' 'Company'
16  'smith11'  'BS'   'intel'
39   'ssri44'  'MS'  'google'
45    'har43'  'MS'     'CDs'

pandas从不同的数据框架更新/替换

1 个答案: