熊猫按周分组并获得日

时间:2019-08-05 06:51:42

标签: pandas timestamp pandas-groupby

假设我已经测试了如下数据:

import pandas as pd

data_dic = {
    "day": ['2019-01-18', '2019-01-18', '2019-01-18', '2019-01-19',
            '2019-01-19','2019-01-25', '2019-02-19', '2019-02-24'],
    "data": [0, 1,3,3, 0, 1,2 ,5],
    "col2": [10, 11,1,1, 10, 1,2, 5],
    "col3": [5, 6,7,8, 9, 1,2, 5]
}

df = pd.DataFrame(data_dic)
df.index = pd.to_datetime(df.day)
df = df.drop(['day'], axis=1)
df.insert(0, 'day_name', df.index.weekday_name)

结果:

            day_name  data  col2  col3
day                                   
2019-01-18    Friday     0    10     5
2019-01-18    Friday     1    11     6
2019-01-18    Friday     3     1     7
2019-01-19  Saturday     3     1     8
2019-01-19  Saturday     0    10     9
2019-01-25    Friday     1     1     1
2019-02-19   Tuesday     2     2     2
2019-02-24    Sunday     5     5     5

现在,我需要按星期和第2列中的最大值将这些数据分组。我通过以下方式完成此操作:

df = df.groupby(df.index.to_period("w")).agg({'col2':'max'})

结果:

                       col2
day                        
2019-01-14/2019-01-20    11
2019-01-21/2019-01-27     1
2019-02-18/2019-02-24     5

问题: 如何获得最大分组值发生在女巫上的日期?

预期结果:

                       col2 day
day                        
2019-01-14/2019-01-20    11 2019-01-18
2019-01-21/2019-01-27     1 2019-01-25 
2019-02-18/2019-02-24     5 2019-02-24

感谢您的时间和精力。

1 个答案:

答案 0 :(得分:2)

DataFrameGroupBy.idxmax与更改后的GroupBy.agg一起使用-在groupby之后指定列名并传递元组:

df1 = df.groupby(df.index.to_period("w"))['col2'].agg([('col2','max'), ('day','idxmax')])
print (df1)
                       col2        day
day                                   
2019-01-14/2019-01-20    11 2019-01-18
2019-01-21/2019-01-27     1 2019-01-25
2019-02-18/2019-02-24     5 2019-02-24

Pandas 0.25+解决方案:

df.groupby(df.index.to_period("w")).agg(col2=pd.NamedAgg(column='col2', aggfunc='max'),
                                        day=pd.NamedAgg(column='col2', aggfunc='idxmax'))