我有一个时间索引的熊猫DataFrame,其数字对之间用一个或几个NaN分隔:
Time
1970-01-01 00:00:00.000 0.0186458125
1970-01-01 00:00:00.066 -0.0165843889
1970-01-01 00:00:00.068 NaN
1970-01-01 00:00:00.116 NaN
1970-01-01 00:00:00.118 -0.0113886875
1970-01-01 00:00:00.166 -0.0117582778
1970-01-01 00:00:00.168 NaN
1970-01-01 00:00:00.216 NaN
1970-01-01 00:00:00.218 -0.0122501875
1970-01-01 00:00:00.232 -0.0122501875
Name: X, dtype: float64
现在我要实现的是计算这些数字对的平均值,并将结果放入中间时间段,以便结果如下所示:
Time
1970-01-01 00:00:00.000 NaN
1970-01-01 00:00:00.033 0.0010307118
1970-01-01 00:00:00.066 NaN
1970-01-01 00:00:00.068 NaN
1970-01-01 00:00:00.116 NaN
1970-01-01 00:00:00.118 NaN
1970-01-01 00:00:00.142 -0.0115734826
1970-01-01 00:00:00.166 NaN
1970-01-01 00:00:00.168 NaN
1970-01-01 00:00:00.216 NaN
1970-01-01 00:00:00.225 -0.0122501875
1970-01-01 00:00:00.232 NaN
我还计划将时间下采样到固定频率的1/500秒,因此可以确定中间NaN的数量比上面显示的要多。有没有或多或少的简单方法可以做到这一点?
谢谢!
答案 0 :(得分:1)
这更像是一步一步的解决方案,我已经将其分解
df=df.to_frame('value')
df['key']=df.isnull().cumsum()
df['time']=df.index.map(lambda x : x.timestamp())# make datetime to numeric get the average
newdf=df.groupby('key').agg({'value':'mean','time':'mean'})# using groupby with agg
newdf.time=pd.to_datetime(newdf.time,unit='s')# convert float type datetime back to datetime format
newdf=newdf.set_index('time').value
df.value=np.nan
df=df.value.combine_first(newdf)# combine_frist with new df with older one
df
1970-01-01 00:00:00.000000 NaN
1970-01-01 00:00:00.033000 0.001031
1970-01-01 00:00:00.066000 NaN
1970-01-01 00:00:00.068000 NaN
1970-01-01 00:00:00.116000 NaN
1970-01-01 00:00:00.118000 NaN
1970-01-01 00:00:00.133333 -0.011573
1970-01-01 00:00:00.166000 NaN
1970-01-01 00:00:00.168000 NaN
1970-01-01 00:00:00.216000 NaN
1970-01-01 00:00:00.218000 NaN
1970-01-01 00:00:00.222000 -0.012250
1970-01-01 00:00:00.232000 NaN
Name: value, dtype: float64