我有10分钟间隔的测量数据。 有时候时间间隔是9分钟59秒,或者10分钟01秒,有时我缺少值,所以时间间隔是20分钟。
我希望代码执行以下操作: 重新采样10分钟的值(我已经实现了)。 事实是,以10:00分钟(9分钟59秒或10分钟01秒)以外的间隔进行的测量丢失了,我想保留此数据。
这是我的测试代码:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=('Datetime','V_L1','V_H3_L1','V_H3_L1_in_P'))
df['Datetime'] = ['01.01.2012 00:00:00', '01.01.2012 00:10:01', '01.01.2012 00:29:59','01.01.2012 00:50:00']
df['V_L1'] = [219,219.7,np.nan,220.3]
df['V_H3_L1'] = [3,1,2.5, np.nan]
df['Datetime'] = pd.to_datetime(df['Datetime'])
df.set_index('Datetime')
df = df.set_index('Datetime').resample('600S').asfreq()
输出:
V_L1 V_H3_L1 V_H3_L1_in_P
Datetime
2012-01-01 00:00:00 219.0 3.0 NaN
2012-01-01 00:10:00 NaN NaN NaN
2012-01-01 00:20:00 NaN NaN NaN
2012-01-01 00:30:00 NaN NaN NaN
2012-01-01 00:40:00 NaN NaN NaN
2012-01-01 00:50:00 220.3 NaN NaN
希望输出:
V_L1 V_H3_L1 V_H3_L1_in_P
Datetime
2012-01-01 00:00:00 219.0 3.0 NaN
2012-01-01 00:10:00 219.7 1.0 NaN
2012-01-01 00:20:00 NaN NaN NaN
2012-01-01 00:30:00 NaN 2.5 NaN
2012-01-01 00:40:00 NaN NaN NaN
2012-01-01 00:50:00 220.3 NaN NaN
因此,我想保留数据,就像接受频率设置(10min,600s)的增量小于几秒+或-5秒一样。
答案 0 :(得分:1)
df['Datetime'] = df['Datetime'].dt.round('min')
df = df.set_index('Datetime').resample('600S').asfreq()
将日期时间舍入到最接近的分钟,然后可以设置索引并重新采样。
答案 1 :(得分:1)
好吧,我写了一个虽然不是很漂亮的函数(我必须假设),但是它确实实现了我想要的功能。当我处理大量数据时,我认为这可能是一种安全的方法。 基本上,如果使用if,elif结构,该函数将检查时间戳的分钟部分,并根据其值确定舍入...(向上或向下),我很确定有更好的解决方法,请分享有一个。
因此,代码是:
import datetime
def round_time(time):
if time.minute>=55:
if time.hour==23:
rounded = time-datetime.timedelta(hours=time.hour,minutes=time.minute,seconds=time.second)+datetime.timedelta(hours=time.hour+1,minutes=0,seconds=0)
else:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(hours=time.hour+1, minutes=0, seconds=0)
elif time.minute >=45:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=50)
elif time.minute >=35:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=40)
elif time.minute >=25:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=30)
elif time.minute >=15:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=20)
elif time.minute >=5:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=10)
elif time.minute >=0:
rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=0)
return rounded
df['Datetime'] = df['Datetime'].apply(lambda x: round_time(x))
df = df.set_index('Datetime').resample('600S').asfreq()
从 How do I round datetime column to nearest quarter hour
尽管上述线程上的解决方案未能解决10分钟的值,但还是不错的参考! (29分钟仍然四舍五入为20,而不是我希望的值30)