Question

我试图做一些我认为应该是单行的事情，但我正在努力做到正确。

我有一个大型数据框，我们称之为lg，还有一个小型数据框，我们称之为sm。每个数据框都有一个start和一个end列，以及多个其他列，所有这些列在两个数据框之间都是相同的（为简单起见，我们将调用全部列type）。有时，sm与start具有相同的end 和 lg，如果是这种情况，我希望sm ＆＃39; type覆盖lg type。

以下是设置：

lg = pd.DataFrame({'start':[1,2,3,4], 'end':[5,6,7,8], 'type':['a','b','c','d']})
sm = pd.DataFrame({'start':[9,2,3], 'end':[10,6,11], 'type':['e','f','g']})

...请注意，唯一匹配的['start','end']组合是['2','6']

我想要的输出：

   start  end type
0      1    5    a
1      2    6    f    # where sm['type'] overwrites lg['type'] because of matching ['start','end']
2      3    7    c
3      3   11    g    # where there is no overwrite because 'end' does not match
4      4    8    d
5      9   10    e    # where this row is added from sm

我已经尝试了.merge()，merge_ordered()等多个版本，但无济于事。我实际上已经让它与merge_ordered()和drop_duplicates()一起工作，只是意识到它只是丢弃了字母表中较早的副本，而不是因为它来自sm。

Answer 1

您可以尝试将start和end列设置为索引，然后使用combine_first：

sm.set_index(['start', 'end']).combine_first(lg.set_index(['start', 'end'])).reset_index()

熊猫：将小型DataFrame合并为大型，用小型覆盖

1 个答案: