我有一个大熊猫30分钟的间隔时间序列。 一个小样本看起来像:
2009-12-02 20:00:00 0.6
2009-12-02 20:30:00 0.7
2009-12-03 01:00:00 0.7
2009-12-03 02:30:00 0.7
2009-12-03 11:30:00 0.7
2009-12-03 12:00:00 1.4
2009-12-03 12:30:00 1.3
如果2之间的间隔持续时间戳超过2小时,我必须识别开始,完成日期(并存储它)。 例如:
event 1 : 2009-12-02 20:00:00 - 2009-12-02 20:30:00
event 2 : 2009-12-03 01:00:00 - 2009-12-03 02:30:00
event 3 : 2009-12-03 11:30:00 - 2009-12-03 12:30:00
但我有点卡在这里!通常,如果它是一个DataFrame,我会使用类似的东西:
for index, row df.iterrows():
#if timedelta > 2 hours etc
有什么建议我怎么开始? Ty
答案 0 :(得分:3)
以下是代码:
import pandas as pd
import io
import numpy as np
data = r"""date,value
2009-12-02 20:00:00,0.6
2009-12-02 20:30:00,0.7
2009-12-03 01:00:00,0.7
2009-12-03 02:30:00,0.7
2009-12-03 11:30:00,0.7
2009-12-03 12:00:00,1.4
2009-12-03 12:30:00,1.3"""
df = pd.read_csv(io.StringIO(data), parse_dates=[0])
diff = df.date - df.date.shift(1)
sections = (diff > np.timedelta64(2, "h")).astype(int).cumsum()
def f(s):
return s.iloc[[0, -1]].reset_index(drop=True)
print df.date.groupby(sections).apply(f).unstack()
输出:
0 1
0 2009-12-02 20:00:00 2009-12-02 20:30:00
1 2009-12-03 01:00:00 2009-12-03 02:30:00
2 2009-12-03 11:30:00 2009-12-03 12:30:00
答案 1 :(得分:0)
t.txt包含日志数据
from datetime import datetime
d1 = d2 = d3 = None
for line in open('t.txt'):
d3 = datetime.strptime(line[:19], '%Y-%m-%d %H:%M:%S')
if d1 is None:
d1 = d2 = d3
elif (d3 - d2).seconds >= 2 * 3600:
print d1, d2
d1 = d2 = d3
else:
d2 = d3
print d1, d2