重新采样,因为频率不会在短时间内丢失数据

时间:2019-05-09 20:34:24

标签: python pandas

我有10分钟间隔的测量数据。 有时候时间间隔是9分钟59秒,或者10分钟01秒,有时我缺少值,所以时间间隔是20分钟。

我希望代码执行以下操作: 重新采样10分钟的值(我已经实现了)。 事实是,以10:00分钟(9分钟59秒或10分钟01秒)以外的间隔进行的测量丢失了,我想保留此数据。

这是我的测试代码:

import pandas as pd
import numpy as np
df = pd.DataFrame(columns=('Datetime','V_L1','V_H3_L1','V_H3_L1_in_P'))

df['Datetime'] = ['01.01.2012 00:00:00', '01.01.2012 00:10:01', '01.01.2012 00:29:59','01.01.2012 00:50:00']
df['V_L1'] = [219,219.7,np.nan,220.3]
df['V_H3_L1'] = [3,1,2.5, np.nan]
df['Datetime'] = pd.to_datetime(df['Datetime'])
df.set_index('Datetime')
df = df.set_index('Datetime').resample('600S').asfreq()

输出:

                  V_L1  V_H3_L1  V_H3_L1_in_P
Datetime                                         
2012-01-01 00:00:00  219.0      3.0           NaN
2012-01-01 00:10:00    NaN      NaN           NaN
2012-01-01 00:20:00    NaN      NaN           NaN
2012-01-01 00:30:00    NaN      NaN           NaN
2012-01-01 00:40:00    NaN      NaN           NaN
2012-01-01 00:50:00  220.3      NaN           NaN

希望输出:

                  V_L1  V_H3_L1  V_H3_L1_in_P
Datetime                                         
2012-01-01 00:00:00  219.0      3.0           NaN
2012-01-01 00:10:00  219.7      1.0           NaN
2012-01-01 00:20:00    NaN      NaN           NaN
2012-01-01 00:30:00    NaN      2.5           NaN
2012-01-01 00:40:00    NaN      NaN           NaN
2012-01-01 00:50:00  220.3      NaN           NaN

因此,我想保留数据,就像接受频率设置(10min,600s)的增量小于几秒+或-5秒一样。

2 个答案:

答案 0 :(得分:1)

df['Datetime'] = df['Datetime'].dt.round('min')
df = df.set_index('Datetime').resample('600S').asfreq()

将日期时间舍入到最接近的分钟,然后可以设置索引并重新采样。

答案 1 :(得分:1)

好吧,我写了一个虽然不是很漂亮的函数(我必须假设),但是它确实实现了我想要的功能。当我处理大量数据时,我认为这可能是一种安全的方法。 基本上,如果使用if,elif结构,该函数将检查时间戳的分钟部分,并根据其值确定舍入...(向上或向下),我很确定有更好的解决方法,请分享有一个。

  • 如果> = 55,则四舍五入至下一个整小时,如果elif> = 45至50,则elif> = 35至40,依此类推。

因此,代码是:

import datetime

def round_time(time):
    if time.minute>=55:
        if time.hour==23:
            rounded = time-datetime.timedelta(hours=time.hour,minutes=time.minute,seconds=time.second)+datetime.timedelta(hours=time.hour+1,minutes=0,seconds=0)
        else:
            rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(hours=time.hour+1, minutes=0, seconds=0)
elif time.minute >=45:
        rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=50)
    elif time.minute >=35:
        rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=40)
    elif time.minute >=25:
        rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=30)
    elif time.minute >=15:
        rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=20)
    elif time.minute >=5:
        rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=10)
    elif time.minute >=0:
        rounded = time-datetime.timedelta(minutes=time.minute, seconds=time.second)+datetime.timedelta(minutes=0)
    return rounded

df['Datetime'] = df['Datetime'].apply(lambda x: round_time(x))
df = df.set_index('Datetime').resample('600S').asfreq()

How do I round datetime column to nearest quarter hour

尽管上述线程上的解决方案未能解决10分钟的值,但还是不错的参考! (29分钟仍然四舍五入为20,而不是我希望的值30)