我有df
通常的时间戳作为索引:
2011-04-01 09:30:00
2011-04-01 09:30:10
...
2011-04-01 09:36:20
...
2011-04-01 09:37:30
如何使用相同的时间戳创建此数据框的列,但舍入到最接近的第5分钟间隔?像这样:
index new_col
2011-04-01 09:30:00 2011-04-01 09:35:00
2011-04-01 09:30:10 2011-04-01 09:35:00
2011-04-01 09:36:20 2011-04-01 09:40:00
2011-04-01 09:37:30 2011-04-01 09:40:00
答案 0 :(得分:14)
The round_to_5min(t)
solution using timedelta
arithmetic是正确但复杂且非常慢。而是在pandas中使用漂亮的Timstamp
:
import numpy as np
import pandas as pd
ns5min=5*60*1000000000 # 5 minutes in nanoseconds
pd.to_datetime(((df.index.astype(np.int64) // ns5min + 1 ) * ns5min))
让我们比较速度:
rng = pd.date_range('1/1/2014', '1/2/2014', freq='S')
print len(rng)
# 86401
# ipython %timeit
%timeit pd.to_datetime(((rng.astype(np.int64) // ns5min + 1 ) * ns5min))
# 1000 loops, best of 3: 1.01 ms per loop
%timeit rng.map(round_to_5min)
# 1 loops, best of 3: 1.03 s per loop
快了大约1000倍!
答案 1 :(得分:5)
您可以尝试这样的事情:
def round_to_5min(t):
delta = datetime.timedelta(minutes=t.minute%5,
seconds=t.second,
microseconds=t.microsecond)
t -= delta
if delta > datetime.timedelta(0):
t += datetime.timedelta(minutes=5)
return t
df['new_col'] = df.index.map(round_to_5min)
答案 2 :(得分:2)
我遇到了同样的问题,但是使用了datetime64p [ns]时间戳。
我用过:
def round_to_5min(t):
""" This function rounds a timedelta timestamp to the nearest 5-min mark"""
t = datetime.datetime(t.year, t.month, t.day, t.hour, t.minute - t.minute%5, 0)
return t
然后是地图'功能
答案 3 :(得分:0)