比较基于最近日期时间的值

时间:2016-11-07 15:21:11

标签: python datetime pandas

我有两个pandas数据帧,都有两列:datetime和value(float)。我想根据最近的日期时间从数据帧B的值中减去数据帧A的值。

示例:

dataframe A:
datetime | value
01-01-2016 00:00 | 10
01-01-2016 01:00 | 12
01-01-2016 02:00 | 14
01-01-2016 03:00 | 12
01-01-2016 04:00 | 12
01-01-2016 05:00 | 16
01-01-2016 06:00 | 18


dataframe B:
datetime | value
01-01-2016 00:20 | 5
01-01-2016 00:50 | -5
01-01-2016 01:20 | 12
01-01-2016 01:50 | 30
01-01-2016 02:20 | 1
01-01-2016 02:50 | 6
01-01-2016 03:50 | 0

如果是A的第一行,这意味着B的最近日期时间也将是第一行,因此:10-5 = 5.如果是第四行A(01-01-2016) 3:00)这意味着B的第六行最接近,差异将是:12-6 = 6。

我目前使用for循环执行此操作:

for i, row in data.iterrows():
    # i is the index, a Timestamp
    data['h'][i] = row['h'] - baro.iloc[baro.index.get_loc(i,method='nearest')]['h']

它工作正常,但是可以更快地完成吗?

2 个答案:

答案 0 :(得分:4)

new pandas 0.19 <br>Your Intrest Rate is: <span style='color:red'>".$interest_rate."%</span><br>

pd.merge_asof

enter image description here

答案 1 :(得分:2)

IIUC如果您使用的是Pandas版本,则可以使用reindex(..., method='nearest')方法&lt; 0.19.0,从0.19.0开始,使用pd.merge_asof绝对有意义,这样更方便,也更有效率:

df1 = df1.set_index('datetime')
df2 = df2.set_index('datetime')

In [214]: df1.join(df2.reindex(df1.index, method='nearest'), rsuffix='_right')
Out[214]:
                     value  value_right
datetime
2016-01-01 00:00:00     10            5
2016-01-01 01:00:00     12           -5
2016-01-01 02:00:00     14           30
2016-01-01 03:00:00     12            6
2016-01-01 04:00:00     12            0
2016-01-01 05:00:00     16            0
2016-01-01 06:00:00     18            0

In [224]: df1.value - df2.reindex(df1.index, method='nearest').value
Out[224]:
datetime
2016-01-01 00:00:00     5
2016-01-01 01:00:00    17
2016-01-01 02:00:00   -16
2016-01-01 03:00:00     6
2016-01-01 04:00:00    12
2016-01-01 05:00:00    16
2016-01-01 06:00:00    18
Name: value, dtype: int64

In [218]: merged = df1.join(df2.reindex(df1.index, method='nearest'), rsuffix='_right')

In [220]: merged.value.subtract(merged.value_right)
Out[220]:
datetime
2016-01-01 00:00:00     5
2016-01-01 01:00:00    17
2016-01-01 02:00:00   -16
2016-01-01 03:00:00     6
2016-01-01 04:00:00    12
2016-01-01 05:00:00    16
2016-01-01 06:00:00    18
dtype: int64