Python - 按日期分组(将日期作为参数传递)

时间:2018-04-10 13:50:44

标签: python-3.x pandas

我有以下DataFrame

    +---------+--------+---------+
    | name    |  dt    | tot popu|
    +---------+--------+---------+
    | hyd     | 10-01-17 |   3   |
    | hyd     | 20-01-17 |   4   |
    | hyd     | 05-05-17 |   3   |
    | pune    | 03-05-17 |   4   |
    | pune    | 06-08-17 |   5   |
    | pune    | 10-06-17 |  6    |
    | mumbai  | 18-04-17 |  4    |
    | mumbai  | 20-04-17 |  4    |
    | mumbai  | 30-03-17 |  2    |
    +---------+------+-----------+

我希望按城市对此DataFrame进行分组,将日期作为分组器进行分组以下工作频率为月份

x = df.groupby(['name', pd.Grouper(key = 'dt', freq = 'M')])['tot popu'].sum().reset_index()

但是我想提供我选择的频率,因为它与我选择的某个特定时期(01/01/17和02/15/17)和(02/16/17和03 / 17/2017)等等

(city)              dt          tot popu   
hyd                 02/15/17      x 
hyd                 03/17/2017    x 
hyd                 04/16/2017    x 

1 个答案:

答案 0 :(得分:0)

您可以使用pandas.cut指定所需的任何频率分档,然后按该分档和城市分组。您只需要小心定义垃圾箱并使用Right参数来获得所需的垃圾箱。

import pandas as pd
df = pd.DataFrame({'name': ['hyd','hyd','hyd','pune','pune','pune',
                           'mubbai', 'mumbai', 'mumbai'],
                  'date': ['10-01-17', '20-01-17', '05-05-17', '03-05-17', 
                        '06-08-17', '10-06-17', '18-04-17', '20-04-17', '30-03-17'],
                  'tot_pop': [3,4,3,4,5,6,4,4,2]})
df['date'] = pd.to_datetime(df.date, format='%d-%m-%y')

bins=[pd.to_datetime('01/01/17'), pd.to_datetime('02/16/17'), pd.to_datetime('03/18/17'),
      pd.to_datetime('03/18/18')]

df['bin'] = pd.cut(df.date, bins=bins, right=False)
df.groupby(['name', 'bin'])['tot_pop'].sum().reset_index()

#     name                       bin  tot_pop
#0     hyd  [2017-01-01, 2017-02-16)        7
#1     hyd  [2017-03-18, 2018-03-18)        3
#2  mubbai  [2017-03-18, 2018-03-18)        4
#3  mumbai  [2017-03-18, 2018-03-18)        6
#4    pune  [2017-03-18, 2018-03-18)       15