我有以下DataFrame
:
+---------+--------+---------+
| name | dt | tot popu|
+---------+--------+---------+
| hyd | 10-01-17 | 3 |
| hyd | 20-01-17 | 4 |
| hyd | 05-05-17 | 3 |
| pune | 03-05-17 | 4 |
| pune | 06-08-17 | 5 |
| pune | 10-06-17 | 6 |
| mumbai | 18-04-17 | 4 |
| mumbai | 20-04-17 | 4 |
| mumbai | 30-03-17 | 2 |
+---------+------+-----------+
我希望按城市对此DataFrame
进行分组,将日期作为分组器进行分组以下工作频率为月份
x = df.groupby(['name', pd.Grouper(key = 'dt', freq = 'M')])['tot popu'].sum().reset_index()
:
但是我想提供我选择的频率,因为它与我选择的某个特定时期(01/01/17和02/15/17)和(02/16/17和03 / 17/2017)等等
(city) dt tot popu
hyd 02/15/17 x
hyd 03/17/2017 x
hyd 04/16/2017 x
答案 0 :(得分:0)
您可以使用pandas.cut
指定所需的任何频率分档,然后按该分档和城市分组。您只需要小心定义垃圾箱并使用Right
参数来获得所需的垃圾箱。
import pandas as pd
df = pd.DataFrame({'name': ['hyd','hyd','hyd','pune','pune','pune',
'mubbai', 'mumbai', 'mumbai'],
'date': ['10-01-17', '20-01-17', '05-05-17', '03-05-17',
'06-08-17', '10-06-17', '18-04-17', '20-04-17', '30-03-17'],
'tot_pop': [3,4,3,4,5,6,4,4,2]})
df['date'] = pd.to_datetime(df.date, format='%d-%m-%y')
bins=[pd.to_datetime('01/01/17'), pd.to_datetime('02/16/17'), pd.to_datetime('03/18/17'),
pd.to_datetime('03/18/18')]
df['bin'] = pd.cut(df.date, bins=bins, right=False)
df.groupby(['name', 'bin'])['tot_pop'].sum().reset_index()
# name bin tot_pop
#0 hyd [2017-01-01, 2017-02-16) 7
#1 hyd [2017-03-18, 2018-03-18) 3
#2 mubbai [2017-03-18, 2018-03-18) 4
#3 mumbai [2017-03-18, 2018-03-18) 6
#4 pune [2017-03-18, 2018-03-18) 15