在最近的时间戳上合并两个pandas数据帧

时间:2015-11-03 05:18:42

标签: python pandas

我有两个daframe df1和df2

df1是

time                  status
2/2/2015 8.00 am      on time
2/2/2015 9.00 am      canceled
2/2/2015 10.30 am     on time
2/2/2015 12.45 pm     on time

df2是

 w_time                 temp
 2/2/2015 8.00 am      45
 2/2/2015 8.50 am      46
 2/2/2015 9.40 am      47
 2/2/2015 10.15 am     47
 2/2/2015 10.35 am     48
 2/2/2015 12.00 pm     48
 2/2/2015 1.00 pm      49

现在我希望合并两个数据帧,使第二个时间戳总是接近或等于第一个时间戳

结果应该是

time              status     w_time              temp

2/2/2015 8.00 am  on time    2/2/2015 8.00 am     45

2/2/2015 9.00 am  canceled   2/2/2015 8.50 am     46

2/2/2015 10.30 am   on time    2/2/2015 10.35 am   48
2/2/2015 12.45 pm   on time    2/2/2015 1.00 pm    49

1 个答案:

答案 0 :(得分:8)

首先确保日期列为datetime64列。

df1['time'] = pd.to_datetime(df1['time'].str.replace(".", ":"))
df2['w_time'] = pd.to_datetime(df2['w_time'].str.replace(".", ":"))

如果您将这些设置为DatetimeIndex,则可以使用reindex和“最近”方法:

In [11]: df1 = df1.set_index("time")

In [12]: df2 = df2.set_index("w_time", drop=False)

In [13]: df1
Out[13]:
                       status
time
2015-02-02 08:00:00   on time
2015-02-02 09:00:00  canceled
2015-02-02 10:30:00   on time
2015-02-02 12:45:00   on time

In [14]: df2
Out[14]:
                     temp              w_time
w_time
2015-02-02 08:00:00    45 2015-02-02 08:00:00
2015-02-02 08:50:00    46 2015-02-02 08:50:00
2015-02-02 09:40:00    47 2015-02-02 09:40:00
2015-02-02 10:15:00    47 2015-02-02 10:15:00
2015-02-02 10:35:00    48 2015-02-02 10:35:00
2015-02-02 12:00:00    48 2015-02-02 12:00:00
2015-02-02 13:00:00    49 2015-02-02 13:00:00

以下内容:

In [15]: df2.reindex(df1.index, method='nearest')
Out[15]:
                     temp              w_time
time
2015-02-02 08:00:00    45 2015-02-02 08:00:00
2015-02-02 09:00:00    46 2015-02-02 08:50:00
2015-02-02 10:30:00    48 2015-02-02 10:35:00
2015-02-02 12:45:00    49 2015-02-02 13:00:00

然后将这些列/连接添加回df1。