使用Pandas对值和时间索引进行下采样或平均数据缩减

时间:2019-01-04 01:09:32

标签: python pandas

我有一个时间索引的熊猫DataFrame,其数字对之间用一个或几个NaN分隔:

Time
1970-01-01 00:00:00.000    0.0186458125
1970-01-01 00:00:00.066   -0.0165843889
1970-01-01 00:00:00.068             NaN
1970-01-01 00:00:00.116             NaN
1970-01-01 00:00:00.118   -0.0113886875
1970-01-01 00:00:00.166   -0.0117582778
1970-01-01 00:00:00.168             NaN
1970-01-01 00:00:00.216             NaN
1970-01-01 00:00:00.218   -0.0122501875
1970-01-01 00:00:00.232   -0.0122501875
Name: X, dtype: float64

现在我要实现的是计算这些数字对的平均值,并将结果放入中间时间段,以便结果如下所示:

Time
1970-01-01 00:00:00.000             NaN
1970-01-01 00:00:00.033    0.0010307118
1970-01-01 00:00:00.066             NaN
1970-01-01 00:00:00.068             NaN
1970-01-01 00:00:00.116             NaN
1970-01-01 00:00:00.118             NaN
1970-01-01 00:00:00.142   -0.0115734826
1970-01-01 00:00:00.166             NaN
1970-01-01 00:00:00.168             NaN
1970-01-01 00:00:00.216             NaN
1970-01-01 00:00:00.225   -0.0122501875
1970-01-01 00:00:00.232             NaN

我还计划将时间下采样到固定频率的1/500秒,因此可以确定中间NaN的数量比上面显示的要多。有没有或多或少的简单方法可以做到这一点?

谢谢!

1 个答案:

答案 0 :(得分:1)

这更像是一步一步的解决方案,我已经将其分解

df=df.to_frame('value')
df['key']=df.isnull().cumsum()

df['time']=df.index.map(lambda x : x.timestamp())# make datetime to numeric get the average
newdf=df.groupby('key').agg({'value':'mean','time':'mean'})# using groupby with agg
newdf.time=pd.to_datetime(newdf.time,unit='s')# convert float type datetime back to datetime format 
newdf=newdf.set_index('time').value
df.value=np.nan

df=df.value.combine_first(newdf)# combine_frist with new df with older one 
df
1970-01-01 00:00:00.000000         NaN
1970-01-01 00:00:00.033000    0.001031
1970-01-01 00:00:00.066000         NaN
1970-01-01 00:00:00.068000         NaN
1970-01-01 00:00:00.116000         NaN
1970-01-01 00:00:00.118000         NaN
1970-01-01 00:00:00.133333   -0.011573
1970-01-01 00:00:00.166000         NaN
1970-01-01 00:00:00.168000         NaN
1970-01-01 00:00:00.216000         NaN
1970-01-01 00:00:00.218000         NaN
1970-01-01 00:00:00.222000   -0.012250
1970-01-01 00:00:00.232000         NaN
Name: value, dtype: float64