在groupby下使用pandas dt函数

时间:2018-03-05 09:11:35

标签: python pandas

我有一个列'dateTime',我正在尝试实现以下(没有gropuby的情况下工作):

df['time_of_day_10'] = df['dateTime'].dt.floor('10min')
df['time_of_day_30'] = df['dateTime'].dt.floor('30min')

但问题是我用我的数据gorupby后:

    groups = df.groupby(groupbytime,as_index=True) 
    df_grouped = (groups.agg({
                'clients1': [np.mean,np.max,],
                'clients2': [np.mean,np.max,],
                }))

我丢失了dateTime,所以我试图将其添加回去并添加:

 groups = df.groupby(groupbytime,as_index=True) 
  df_grouped = (groups.agg({
                'dateTime':['first'],
                 'clients1': [np.mean,np.max,],
                 'clients2': [np.mean,np.max,],
                 }))

将为我提供

类型的dateTims
dateTime           first               datetime64[ns]

我试图将圆形时间和日期作为分组中的coulmns。 谢谢!

编辑添加样本数据: 原始数据:

    dateTime    Clients1    Clients2
8   2017-10-23 08:00:04.854309  12991.5 2
10  2017-10-23 08:00:04.875162  12991.5 1
11  2017-10-23 08:00:04.875162  12991.5 1
12  2017-10-23 08:00:04.875162  12991.5 1
13  2017-10-23 08:00:04.875162  12991.5 1
23  2017-10-23 08:00:04.876464  12989.5 1
24  2017-10-23 08:00:04.876464  12989.5 1
32  2017-10-23 08:00:04.964356  12990   1
34  2017-10-23 08:00:04.968549  12990.5 1
38  2017-10-23 08:00:05.008758  12990   1
43  2017-10-23 08:00:05.996090  12990   2
45  2017-10-23 08:00:06.018212  12990   1
51  2017-10-23 08:00:06.344568  12989.5 1
56  2017-10-23 08:00:06.903661  12990   1
60  2017-10-23 08:00:07.120324  12990   1
66  2017-10-23 08:00:07.206179  12990.5 1
74  2017-10-23 08:00:07.358889  12991.5 3
77  2017-10-23 08:00:07.491244  12991   1
80  2017-10-23 08:00:07.671106  12991   1
83  2017-10-23 08:00:07.897968  12991   1
87  2017-10-23 08:00:08.028444  12991   1
95  2017-10-23 08:00:09.787827  12991.5 3
98  2017-10-23 08:00:10.178936  12991.5 3
104 2017-10-23 08:00:10.505921  12991.5 2
110 2017-10-23 08:00:11.438628  12992   1
112 2017-10-23 08:00:12.145907  12992   1

gorupby结果如下:

    dateTime    Clients1    Clients1    Clients2    Clients2
    first   mean    amax    mean    amax
1min                    
2017-10-23 08:00:00 2017-10-23 08:00:04.854309  12988.8902439024    12993.5 227 12987.7398373984
2017-10-23 08:01:00 2017-10-23 08:01:00.005942  12986.92    12988.5 84  12986.28
2017-10-23 08:02:00 2017-10-23 08:02:00.901496  12987.6486486486    12988.5 98  12987
2017-10-23 08:03:00 2017-10-23 08:03:00.521976  12986.8148148148    12987.5 65  12986.1296296296
2017-10-23 08:04:00 2017-10-23 08:04:02.800922  12986.4705882353    12986.5 47  12985.5294117647
2017-10-23 08:05:00 2017-10-23 08:05:00.670865  12985.3658536585    12986   88  12984.7804878049
2017-10-23 08:06:00 2017-10-23 08:06:00.141393  12987.359375    12988   103 12986.734375
2017-10-23 08:07:00 2017-10-23 08:07:00.922107  12987.5454545455    12988   34  12986.7727272727
2017-10-23 08:08:00 2017-10-23 08:08:00.165103  12986.8214285714    12988   46  12986.0714285714
2017-10-23 08:09:00 2017-10-23 08:09:01.910121  12988.96875 12990   145 12988.328125
2017-10-23 08:10:00 2017-10-23 08:10:00.008064  12988.2678571429    12989.5 102 12987.6785714286
2017-10-23 08:11:00 2017-10-23 08:11:05.533862  12989.4318181818    12991   71  12988.8636363636
2017-10-23 08:12:00 2017-10-23 08:12:01.124564  12991.0444444444    12992.5 144 12990.4444444444
2017-10-23 08:13:00 2017-10-23 08:13:00.347987  12992.84375 12995   185 12992.0390625
2017-10-23 08:14:00 2017-10-23 08:14:00.627402  12994.2906976744    12996   216 12993.6395348837
2017-10-23 08:15:00 2017-10-23 08:15:00.032132  12994.8859649123    12996.5 211 12994.298245614

1 个答案:

答案 0 :(得分:1)

一种可能的解决方案是floor后的agg

df_grouped[('time_of_day_10', 'first')] = df_grouped[('dateTime', 'first')].dt.floor('10min')
df_grouped[('time_of_day_30', 'first')] = df_grouped[('dateTime', 'first')].dt.floor('30min')

编辑:如果每组需要最大日期,请使用date的自定义函数:

groups = df.groupby('dateTime',as_index=True) 
df_grouped = (groups.agg({
                'dateTime':[lambda x: x.dt.date.max()],
                 'Clients1': [np.mean,np.max,],
                 'Clients2': [np.mean,np.max,],
                 }))

print (df_grouped.dtypes)
Clients1  mean        float64
          amax        float64
dateTime  <lambda>     object <-pure python date is object
Clients2  mean          int64
          amax          int64
dtype: object

或者,如果每个群组的最长日期最长时间按floor使用d

df_grouped = (groups.agg({
                'dateTime':[lambda x: x.dt.floor('d').max()],
                 'Clients1': [np.mean,np.max,],
                 'Clients2': [np.mean,np.max,],
                 }))

print (df_grouped.dtypes)
Clients1  mean               float64
          amax               float64
dateTime  <lambda>    datetime64[ns] <- floor return pandas datetime
Clients2  mean                 int64
          amax                 int64
dtype: object
print (df_grouped)
                           Clients1             dateTime Clients2     
                               mean     amax    <lambda>     mean amax
dateTime                                                              
2017-10-23 08:00:04.854309  12991.5  12991.5  2017-10-23        2    2
2017-10-23 08:00:04.875162  12991.5  12991.5  2017-10-23        1    1
2017-10-23 08:00:04.876464  12989.5  12989.5  2017-10-23        1    1
2017-10-23 08:00:04.964356  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:04.968549  12990.5  12990.5  2017-10-23        1    1
2017-10-23 08:00:05.008758  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:05.996090  12990.0  12990.0  2017-10-23        2    2
2017-10-23 08:00:06.018212  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:06.344568  12989.5  12989.5  2017-10-23        1    1
2017-10-23 08:00:06.903661  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:07.120324  12990.0  12990.0  2017-10-23        1    1
2017-10-23 08:00:07.206179  12990.5  12990.5  2017-10-23        1    1
2017-10-23 08:00:07.358889  12991.5  12991.5  2017-10-23        3    3
2017-10-23 08:00:07.491244  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:07.671106  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:07.897968  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:08.028444  12991.0  12991.0  2017-10-23        1    1
2017-10-23 08:00:09.787827  12991.5  12991.5  2017-10-23        3    3
2017-10-23 08:00:10.178936  12991.5  12991.5  2017-10-23        3    3
2017-10-23 08:00:10.505921  12991.5  12991.5  2017-10-23        2    2
2017-10-23 08:00:11.438628  12992.0  12992.0  2017-10-23        1    1
2017-10-23 08:00:12.145907  12992.0  12992.0  2017-10-23        1    1