我有两个不同的数据框。 Df1具有不同时间间隔的时间戳,如下所示
time sales
2019-01-01 2:00:00 20000
2019-01-01 2:20:00 15600
2019-01-01 2:40:00 15444
...
2019-12-01 3:00:00 13000
2019-12-01 3:30:00 650
Df2的时间戳为1分钟,如下所示
time ratings
2019-01-01 2:01:00 0.04
2019-01-01 2:02:00 0.04
2019-01-01 2:03:00 0.04
2019-01-01 2:04:00 0.04
...
2019-12-01 3:00:00 0.01
2019-12-02 3:01:00 0.01
我想像下面那样合并两个数据框
time sales ratings
2019-01-01 2:00:00 20000 [mean of ratings from 2:00:00 ~2:19:00]
2019-01-01 2:20:00 15600 [mean of ratings from 2:20:00 ~2:39:00]
2019-01-01 2:40:00 15444 [mean of ratings from 2:40:00 ~2:59:00]
我将不胜感激!谢谢:)
答案 0 :(得分:0)
让我们尝试pd.cut
:
lower_bounds = pd.cut(df2['time'],
bins=list(df1['time']) + [pd.to_datetime('2050-01-01')],
right=False, include_lowest=True,
labels=df1['time'])
df1['ratings'] = (df2.groupby(lower_bounds)
['rating'].mean()
.reindex(df1['time'])
.values
)
或者您可以使用merge_asof
:
df1['ratings'] = pd.merge_asof(df2, df1.reset_index(),
on='time',
direction='backward'
).groupby('index')['rating'].mean()