我有一个列'dateTime',我正在尝试实现以下(没有gropuby的情况下工作):
df['time_of_day_10'] = df['dateTime'].dt.floor('10min')
df['time_of_day_30'] = df['dateTime'].dt.floor('30min')
但问题是我用我的数据gorupby后:
groups = df.groupby(groupbytime,as_index=True)
df_grouped = (groups.agg({
'clients1': [np.mean,np.max,],
'clients2': [np.mean,np.max,],
}))
我丢失了dateTime,所以我试图将其添加回去并添加:
groups = df.groupby(groupbytime,as_index=True)
df_grouped = (groups.agg({
'dateTime':['first'],
'clients1': [np.mean,np.max,],
'clients2': [np.mean,np.max,],
}))
将为我提供
类型的dateTimsdateTime first datetime64[ns]
我试图将圆形时间和日期作为分组中的coulmns。 谢谢!
编辑添加样本数据: 原始数据:
dateTime Clients1 Clients2
8 2017-10-23 08:00:04.854309 12991.5 2
10 2017-10-23 08:00:04.875162 12991.5 1
11 2017-10-23 08:00:04.875162 12991.5 1
12 2017-10-23 08:00:04.875162 12991.5 1
13 2017-10-23 08:00:04.875162 12991.5 1
23 2017-10-23 08:00:04.876464 12989.5 1
24 2017-10-23 08:00:04.876464 12989.5 1
32 2017-10-23 08:00:04.964356 12990 1
34 2017-10-23 08:00:04.968549 12990.5 1
38 2017-10-23 08:00:05.008758 12990 1
43 2017-10-23 08:00:05.996090 12990 2
45 2017-10-23 08:00:06.018212 12990 1
51 2017-10-23 08:00:06.344568 12989.5 1
56 2017-10-23 08:00:06.903661 12990 1
60 2017-10-23 08:00:07.120324 12990 1
66 2017-10-23 08:00:07.206179 12990.5 1
74 2017-10-23 08:00:07.358889 12991.5 3
77 2017-10-23 08:00:07.491244 12991 1
80 2017-10-23 08:00:07.671106 12991 1
83 2017-10-23 08:00:07.897968 12991 1
87 2017-10-23 08:00:08.028444 12991 1
95 2017-10-23 08:00:09.787827 12991.5 3
98 2017-10-23 08:00:10.178936 12991.5 3
104 2017-10-23 08:00:10.505921 12991.5 2
110 2017-10-23 08:00:11.438628 12992 1
112 2017-10-23 08:00:12.145907 12992 1
gorupby结果如下:
dateTime Clients1 Clients1 Clients2 Clients2
first mean amax mean amax
1min
2017-10-23 08:00:00 2017-10-23 08:00:04.854309 12988.8902439024 12993.5 227 12987.7398373984
2017-10-23 08:01:00 2017-10-23 08:01:00.005942 12986.92 12988.5 84 12986.28
2017-10-23 08:02:00 2017-10-23 08:02:00.901496 12987.6486486486 12988.5 98 12987
2017-10-23 08:03:00 2017-10-23 08:03:00.521976 12986.8148148148 12987.5 65 12986.1296296296
2017-10-23 08:04:00 2017-10-23 08:04:02.800922 12986.4705882353 12986.5 47 12985.5294117647
2017-10-23 08:05:00 2017-10-23 08:05:00.670865 12985.3658536585 12986 88 12984.7804878049
2017-10-23 08:06:00 2017-10-23 08:06:00.141393 12987.359375 12988 103 12986.734375
2017-10-23 08:07:00 2017-10-23 08:07:00.922107 12987.5454545455 12988 34 12986.7727272727
2017-10-23 08:08:00 2017-10-23 08:08:00.165103 12986.8214285714 12988 46 12986.0714285714
2017-10-23 08:09:00 2017-10-23 08:09:01.910121 12988.96875 12990 145 12988.328125
2017-10-23 08:10:00 2017-10-23 08:10:00.008064 12988.2678571429 12989.5 102 12987.6785714286
2017-10-23 08:11:00 2017-10-23 08:11:05.533862 12989.4318181818 12991 71 12988.8636363636
2017-10-23 08:12:00 2017-10-23 08:12:01.124564 12991.0444444444 12992.5 144 12990.4444444444
2017-10-23 08:13:00 2017-10-23 08:13:00.347987 12992.84375 12995 185 12992.0390625
2017-10-23 08:14:00 2017-10-23 08:14:00.627402 12994.2906976744 12996 216 12993.6395348837
2017-10-23 08:15:00 2017-10-23 08:15:00.032132 12994.8859649123 12996.5 211 12994.298245614
答案 0 :(得分:1)
一种可能的解决方案是floor
后的agg
:
df_grouped[('time_of_day_10', 'first')] = df_grouped[('dateTime', 'first')].dt.floor('10min')
df_grouped[('time_of_day_30', 'first')] = df_grouped[('dateTime', 'first')].dt.floor('30min')
编辑:如果每组需要最大日期,请使用date
的自定义函数:
groups = df.groupby('dateTime',as_index=True)
df_grouped = (groups.agg({
'dateTime':[lambda x: x.dt.date.max()],
'Clients1': [np.mean,np.max,],
'Clients2': [np.mean,np.max,],
}))
print (df_grouped.dtypes)
Clients1 mean float64
amax float64
dateTime <lambda> object <-pure python date is object
Clients2 mean int64
amax int64
dtype: object
或者,如果每个群组的最长日期最长时间按floor
使用d
:
df_grouped = (groups.agg({
'dateTime':[lambda x: x.dt.floor('d').max()],
'Clients1': [np.mean,np.max,],
'Clients2': [np.mean,np.max,],
}))
print (df_grouped.dtypes)
Clients1 mean float64
amax float64
dateTime <lambda> datetime64[ns] <- floor return pandas datetime
Clients2 mean int64
amax int64
dtype: object
print (df_grouped)
Clients1 dateTime Clients2
mean amax <lambda> mean amax
dateTime
2017-10-23 08:00:04.854309 12991.5 12991.5 2017-10-23 2 2
2017-10-23 08:00:04.875162 12991.5 12991.5 2017-10-23 1 1
2017-10-23 08:00:04.876464 12989.5 12989.5 2017-10-23 1 1
2017-10-23 08:00:04.964356 12990.0 12990.0 2017-10-23 1 1
2017-10-23 08:00:04.968549 12990.5 12990.5 2017-10-23 1 1
2017-10-23 08:00:05.008758 12990.0 12990.0 2017-10-23 1 1
2017-10-23 08:00:05.996090 12990.0 12990.0 2017-10-23 2 2
2017-10-23 08:00:06.018212 12990.0 12990.0 2017-10-23 1 1
2017-10-23 08:00:06.344568 12989.5 12989.5 2017-10-23 1 1
2017-10-23 08:00:06.903661 12990.0 12990.0 2017-10-23 1 1
2017-10-23 08:00:07.120324 12990.0 12990.0 2017-10-23 1 1
2017-10-23 08:00:07.206179 12990.5 12990.5 2017-10-23 1 1
2017-10-23 08:00:07.358889 12991.5 12991.5 2017-10-23 3 3
2017-10-23 08:00:07.491244 12991.0 12991.0 2017-10-23 1 1
2017-10-23 08:00:07.671106 12991.0 12991.0 2017-10-23 1 1
2017-10-23 08:00:07.897968 12991.0 12991.0 2017-10-23 1 1
2017-10-23 08:00:08.028444 12991.0 12991.0 2017-10-23 1 1
2017-10-23 08:00:09.787827 12991.5 12991.5 2017-10-23 3 3
2017-10-23 08:00:10.178936 12991.5 12991.5 2017-10-23 3 3
2017-10-23 08:00:10.505921 12991.5 12991.5 2017-10-23 2 2
2017-10-23 08:00:11.438628 12992.0 12992.0 2017-10-23 1 1
2017-10-23 08:00:12.145907 12992.0 12992.0 2017-10-23 1 1