我有一个数据框out
:
dates min max wh
0 2005-09-06 07:41:18 21:59:57 14:18:39
1 2005-09-12 14:49:22 14:49:22 00:00:00
2 2005-09-19 11:08:56 11:24:05 00:15:09
3 2005-09-21 21:19:21 21:20:15 00:00:54
4 2005-09-22 19:41:52 19:41:52 00:00:00
5 2005-10-13 11:22:07 21:05:41 09:43:34
6 2005-11-22 11:53:12 21:21:22 09:28:10
7 2005-11-23 00:07:01 14:08:50 14:01:49
8 2005-11-30 13:42:48 23:59:19 10:16:31
9 2005-12-01 00:05:16 10:24:12 10:18:56
10 2005-12-21 17:38:43 19:26:03 01:47:20
11 2005-12-22 09:20:07 11:25:40 02:05:33
12 2006-01-23 07:46:20 08:01:52 00:15:32
13 2006-04-27 16:27:54 19:29:52 03:01:58
14 2006-05-11 12:48:34 23:10:44 10:22:10
15 2006-05-15 10:14:59 22:28:12 12:13:13
16 2006-05-16 01:14:07 23:55:51 22:41:44
17 2006-05-17 01:12:45 23:57:56 22:45:11
18 2006-05-18 02:42:08 21:48:49 19:06:41
我想要每月平均每天的工作时间(显示wh
列)。
out['dates'] = pd.to_datetime(out['dates'])
out['month']= pd.PeriodIndex(out.dates, freq='M')
out2=out.groupby('month')['wh'].mean().reset_index(name='wh2')
到目前为止,我一直使用此方法,但是wh
中的值不是数字数据,因此无法建立均值。如何转换整个列wh
以建立均值?
我的wh
由以下人组成:
df = pd.read_csv("Testordner2/"+i, parse_dates=True)
df['new_time'] = pd.to_datetime(df['new_time'])
df['dates']= df['new_time'].dt.date
df['time'] = df['new_time'].dt.time
out = df.groupby(df['dates']).agg({'time': ['min', 'max']}) \
.stack(level=0).droplevel(1)
out['min_as_time_format'] = pd.to_datetime(out['min'], format="%H:%M:%S")
out['max_as_time_format'] = pd.to_datetime(out['max'], format="%H:%M:%S")
out['wh'] = out['max_as_time_format'] - out['min_as_time_format']
out['wh'].astype(str).str[-18:-10]
答案 0 :(得分:3)
一种可能的解决方案是将时间增量转换为原始格式,聚合mean
,然后再转换回时间增量:
out['dates'] = pd.to_datetime(out['dates'])
out['month']= pd.PeriodIndex(out.dates, freq='M')
out['wh'] = pd.to_timedelta(out['wh']).astype(np.int64)
out2=pd.to_timedelta(out.groupby('month')['wh'].mean()).reset_index(name='wh2')
print (out2)
month wh2
0 2005-09 02:54:56.400000
1 2005-10 09:43:34
2 2005-11 11:15:30
3 2005-12 04:43:56.333333
4 2006-01 00:15:32
5 2006-04 03:01:58
6 2006-05 17:25:47.800000