通过更新重复索引并外部连接其余数据来组合两个数据框的最佳方法

时间:2019-12-27 15:16:41

标签: python pandas merge

对我来说,组合以下两个数据框的最佳方法是什么?我想要:

  • const arr = [ ['foo', 'bar', 'hey', 'oi'], ['foo', 'bar', 'hey'], ['foo', 'bar', 'anything'], ['bar', 'anything'] ]; var flat = arr.flat(); //▼ filters through the words to see which ones are all included console.log(flat.filter(v => arr.every(a => a.includes(v))) .filter((v, i, a) => a.indexOf(v) === i)); //▲ filter through the 4 bars to get only onedesired_dfnew_df中任何重复的securitydate索引使用new_df的价格(例如,更新stock2下面)
  • old_df保留desired_df中所有未出现在old_df中的条目(保留stock3)
  • new_df包括desired_df中未出现在new_df中的所有条目(添加股票2)

以下是我正在寻找的示例:

old_df

以下是old_df = pd.DataFrame({'security': ['stock1', 'stock3'],'date': ['2019-12-23', '2019-12-23'],'price':[10,9]}).set_index(['security','date']) new_df = pd.DataFrame({'security': ['stock1', 'stock2'],'date': ['2019-12-23', '2019-12-24'],'price':[11,12]}).set_index(['security','date']) desired_df = pd.DataFrame({'security': ['stock1', 'stock2', 'stock3'],'date': ['2019-12-23', '2019-12-24', '2019-12-23'],'price':[11,12,11]}).set_index(['security','date']) old_df和我的new_df的打印输出:

desired_df

1 个答案:

答案 0 :(得分:2)

IIUC,您可以使用combine_first

desired_df = new_df.combine_first(old_df)

                     price
security date             
stock1   2019-12-23   11.0
stock2   2019-12-24   12.0
stock3   2019-12-23    9.0