Question

我有两个数据框

数据框 A - 最新

Key1 Key2
-------------
L1   Value1
L2   Value2
L3   Value3

数据框 B - 旧

Key1 Key2
--------------
L1   ValueOld1
L2   ValueOld2
L3   ValueOld3
R1   ValueOld1
R2   ValueOld2
R3   ValueOld3

我想merge(A,B)以某种方式同时处理数据帧 A 和 B

这可以命名为 union with overwriting newest values。我希望这个 merge(A,B) 方法是健壮的，并且可以很好地处理不同 B 数据帧的以下情况：

数据框 A

Key1 Key2
-------------
L1   Value1
L2   Value2
L3   Value3

数据框 B

Key1 Key2
--------------
L1   ValueOld1
L2   ValueOld2
L3   ValueOld3
R1   ValueOld1
R2   ValueOld2
R3   ValueOld3

合并(A,B)

Key1 Key2
--------------
L1   Value1
L2   Value2
L3   Value3
R1   ValueOld1
R2   ValueOld2
R3   ValueOld3

数据框 B

Key1 Key2
--------------
L1   ValueOld1
L2   ValueOld2
R1   ValueOld1

合并(A,B)

Key1 Key2
--------------
L1   Value1
L2   Value2
L3   Value3
R1   ValueOld1

数据框 B

Key1 Key2
--------------
L1   ValueOld1
L2   ValueOld2

合并(A,B)

Key1 Key2
--------------
L1   Value1
L2   Value2
L3   Value3

最后但并非最不重要的是，Merge(A,B) 应保留数据帧中的 Key1 顺序。如何用熊猫实现这一目标？

Answer 1

concat 然后 drop_duplicates：

pd.concat((new,old)).drop_duplicates('Key1')

或者使用 isin 检查旧 df 中不存在哪些 id，然后连接：

pd.concat((new,old[~old['Key1'].isin(new['Key1'])]))

  Key1       Key2
0   L1     Value1
1   L2     Value2
2   L3     Value3
3   R1  ValueOld1
4   R2  ValueOld2
5   R3  ValueOld3