按日期和城市重新采样和汇总数据框

时间:2020-04-10 11:40:31

标签: python pandas dataframe

我想按日期对数据框进行重新采样并根据城市的总和创建一列 df:

+-----------------+-------------------+------------+
| booking_date    |  Cities           |  province  | 
+-----------------+-------------------+------------+
|  15-12-17       |  Kota Depok       | Jawa Barat |    
|  15-12-17       |  Bogor            | Jawa Barat |      
|  15-12-17       |  Kota Depok       | Jawa Barat |     
|  15-12-17       |  Kota Bandung     | Jawa Barat |    
|  15-12-17       |  Kota Bandung     | Jawa Barat |   
+-----------------+-------------------+------------+

输出看起来像这样:

df:

+-----------------+-------------------+------------+------------+
| booking_date    |  Cities           |  province  |  Count     | 
+-----------------+-------------------+------------+------------+
|  15-12-17       |  Kota Depok       | Jawa Barat |  2         |
|  15-12-17       |  Bogor            | Jawa Barat |  1         |
|  15-12-17       |  Kota Bandung     | Jawa Barat |  2         | 
+-----------------+-------------------+------------+------------+

如何实现?

2 个答案:

答案 0 :(得分:2)

GroupBy.sizeSeries.reset_indexname参数一起使用:

df = df.groupby(['booking_date','Cities','province']).size().reset_index(name='Count')

答案 1 :(得分:1)

我想到的第一个解决方案与@jezrael的解决方案相同。但是,另一个可能是结合pandas.DataFrame.assign()pandas.Series.map()pandas.Series.value_counts()pandas.DataFrame.drop_duplicates()的那个。

代码如下。

>>> df = df\
...     .assign(Count = df['Cities'].map(df['Cities'].value_counts()))\
...     .drop_duplicates()
>>> print(df)
  booking_date        Cities    province  Count
0     15-12-17    Kota Depok  Jawa Barat      2
1     15-12-17         Bogor  Jawa Barat      1
3     15-12-17  Kota Bandung  Jawa Barat      2