Python Pandas - 将两个Dataframe与新旧行合并

时间:2018-04-09 18:00:35

标签: python pandas dataframe

我有两个带有相同(对应)索引的行的Dataframe,我想要合并。每行都有一个更新时间。对于具有相同索引的行,具有更高更新时间的行将获胜。应该采用“较新”行中的所有字段,除了字段仅在“较旧”行中是值。 例如:

df1 = pd.DataFrame({'Hugo' : {'age' : 21, 'weight' : 75},
                   'Niklas': {'age' : 46, 'weight' : 65},
                   'Ronald' : {'age' : 76, 'weight' : 85, 'height' : 176}}).T
df1.index.names = ['name']
df1['update_time'] = 1

df2 = pd.DataFrame({'Hugo' : {'age' : 22, 'weight' : 77},
                   'Bertram': {'age' : 45, 'weight' : 65, 'height' : 190},
                   'Donald' : {'age' : 75, 'weight' : 85},
                   'Ronald' : {'age' : 77, 'weight' : 84}}).T
df2.index.names = ['name']
df2['update_time'] = 2


df1:
+--------+-------+----------+----------+---------------+
| name   |   age |   height |   weight |   update_time |
|--------+-------+----------+----------+---------------|
| Hugo   |    21 |      nan |       75 |             1 |
| Niklas |    46 |      nan |       65 |             1 |
| Ronald |    76 |      176 |       85 |             1 |
+--------+-------+----------+----------+---------------+
df2:
+---------+-------+----------+---------------+
| name    |   age |   weight |   update_time |
|---------+-------+----------+---------------|
| Bertram |    45 |       65 |             2 |
| Donald  |    75 |       85 |             2 |
| Hugo    |    22 |       77 |             2 |
| Ronald  |    77 |       84 |             2 |
+---------+-------+----------+---------------+

结果应如下所示:

+---------+-------+----------+----------+---------------+
| name    |   age |   height |   weight |   update_time |
|---------+-------+----------+----------+---------------|
| Niklas  |    46 |      nan |       65 |             1 |
| Bertram |    45 |      190 |       65 |             2 |
| Donald  |    75 |      nan |       85 |             2 |
| Hugo    |    22 |      nan |       77 |             2 |
| Ronald  |    77 |      176 |       84 |             2 |
+---------+-------+----------+----------+---------------+

我怎么能这样做?问题是保持场地与罗纳德的高度。 如果我首先执行df.Update df1,那么时间戳不再存在,我找不到更旧的重复项。 如果我执行df.append,我无法合并字段。

1 个答案:

答案 0 :(得分:5)

使用combine_first

df2.combine_first(df1)

输出:

          age  height  weight  update_time
name                                      
Bertram  45.0   190.0    65.0          2.0
Donald   75.0     NaN    85.0          2.0
Hugo     22.0     NaN    77.0          2.0
Niklas   46.0     NaN    65.0          1.0
Ronald   77.0   176.0    84.0          2.0