大熊猫重新采样时的时间均值

时间:2017-03-30 10:39:22

标签: python pandas

鉴于此数据框:

df = pd.DataFrame(pd.to_timedelta(['00:00:02','00:00:05','00:00:10','00:00:15','00:00:05']))
df.index = pd.to_datetime(['20/02/2017 12:42:10','20/02/2017 12:43:10','20/02/2017 12:45:10','20/02/2017 12:45:10','20/02/2017 12:45:10'])
df.columns = ['time']

df
Out[232]: 
                        time
2017-02-20 12:42:10  00:00:02
2017-02-20 12:43:10  00:00:05
2017-02-20 12:45:10  00:00:10
2017-02-20 12:45:10  00:00:15
2017-02-20 12:45:10  00:00:05

我试图重新取样,获得每分钟的平均时间。 P.e它在总结它们时起作用:

df.resample('min').sum()
Out[245]: 
                        time
2017-02-20 12:42:00 00:00:02
2017-02-20 12:43:00 00:00:05
2017-02-20 12:44:00 00:00:00
2017-02-20 12:45:00 00:00:30

有什么办法让这项工作意味着什么?

类似的东西:

df.resample('min').mean()

1 个答案:

答案 0 :(得分:1)

您可以先将timedeltas转换为total_seconds(浮动),resample并使用fillna。最后转换to_timedelta

df = pd.to_timedelta(df.time.dt.total_seconds().resample('min').mean().fillna(0), unit='s')
print (df)
2017-02-20 12:42:00   00:00:02
2017-02-20 12:43:00   00:00:05
2017-02-20 12:44:00   00:00:00
2017-02-20 12:45:00   00:00:10
Freq: T, Name: time, dtype: timedelta64[ns]

转换为nanoseconds

可以提高精度
print (pd.Series(df.time.values.astype(np.int64), index=df.index))
2017-02-20 12:42:10     2000000000
2017-02-20 12:43:10     5000000000
2017-02-20 12:45:10    10000000000
2017-02-20 12:45:10    15000000000
2017-02-20 12:45:10     5000000000
dtype: int64

df = pd.to_timedelta(pd.Series(df.time.values.astype(np.int64), index=df.index)
                                 .resample('min').mean().fillna(0))
print (df)
2017-02-20 12:42:00   00:00:02
2017-02-20 12:43:00   00:00:05
2017-02-20 12:44:00   00:00:00
2017-02-20 12:45:00   00:00:10
Freq: T, dtype: timedelta64[ns]