我有以下DataFrame:
"super-rare": 0
我想对一年中不同星期或几天进行统计(例如,一年中前7天的平均时长)。我尝试过的是运行
>>> sampleDate = pd.date_range('2018-01-01', periods=10, freq='D 16H')
>>> duration = pd.TimedeltaIndex(data =['123 days 5 hours', '1 day 6 min', '2 days', '23 hours 7 min','5 days 17 hours 3 min','18 min', '1 day 17 hours', '22 day 2 min', '22 hours','15 min'])
>>> df = pd.DataFrame(data={'time': sampleDate, 'duration': duration})
>>> df = df.set_index('time').sort_index()
>>> df
duration
time
2018-01-01 00:00:00 123 days 05:00:00
2018-01-02 16:00:00 1 days 00:06:00
2018-01-04 08:00:00 2 days 00:00:00
2018-01-06 00:00:00 0 days 23:07:00
2018-01-07 16:00:00 5 days 17:03:00
2018-01-09 08:00:00 0 days 00:18:00
2018-01-11 00:00:00 1 days 17:00:00
2018-01-12 16:00:00 22 days 00:02:00
2018-01-14 08:00:00 0 days 22:00:00
2018-01-16 00:00:00 0 days 00:15:00
要获取正确的列将需要什么?
答案 0 :(得分:2)
您可以通过astype
将持续时间转换为原始格式,计数mean
并转换回来:
df['duration'] = df['duration'].astype(np.int64)
df = pd.to_timedelta(df.rolling('7d')['duration'].mean())
print (df)
time
2018-01-01 00:00:00 123 days 05:00:00
2018-01-02 16:00:00 62 days 02:33:00
2018-01-04 08:00:00 42 days 01:42:00
2018-01-06 00:00:00 31 days 19:03:15
2018-01-07 16:00:00 26 days 13:51:12
2018-01-09 08:00:00 1 days 22:30:48
2018-01-11 00:00:00 2 days 01:53:36
2018-01-12 16:00:00 6 days 01:54:00
2018-01-14 08:00:00 6 days 01:40:36
2018-01-16 00:00:00 4 days 22:19:00
Name: duration, dtype: timedelta64[ns]
或将时间增量转换为秒:
df['duration'] = df['duration'].dt.total_seconds()
df1 = df.rolling('7d')['duration'].mean()
print (df1)
time
2018-01-01 00:00:00 10645200.0
2018-01-02 16:00:00 5365980.0
2018-01-04 08:00:00 3634920.0
2018-01-06 00:00:00 2746995.0
2018-01-07 16:00:00 2296272.0
2018-01-09 08:00:00 167448.0
2018-01-11 00:00:00 179616.0
2018-01-12 16:00:00 525240.0
2018-01-14 08:00:00 524436.0
2018-01-16 00:00:00 425940.0
Name: duration, dtype: float64