我想按日期对数据框进行重新采样并根据城市的总和创建一列 df:
+-----------------+-------------------+------------+
| booking_date | Cities | province |
+-----------------+-------------------+------------+
| 15-12-17 | Kota Depok | Jawa Barat |
| 15-12-17 | Bogor | Jawa Barat |
| 15-12-17 | Kota Depok | Jawa Barat |
| 15-12-17 | Kota Bandung | Jawa Barat |
| 15-12-17 | Kota Bandung | Jawa Barat |
+-----------------+-------------------+------------+
输出看起来像这样:
df:
+-----------------+-------------------+------------+------------+
| booking_date | Cities | province | Count |
+-----------------+-------------------+------------+------------+
| 15-12-17 | Kota Depok | Jawa Barat | 2 |
| 15-12-17 | Bogor | Jawa Barat | 1 |
| 15-12-17 | Kota Bandung | Jawa Barat | 2 |
+-----------------+-------------------+------------+------------+
如何实现?
答案 0 :(得分:2)
将GroupBy.size
与Series.reset_index
和name
参数一起使用:
df = df.groupby(['booking_date','Cities','province']).size().reset_index(name='Count')
答案 1 :(得分:1)
我想到的第一个解决方案与@jezrael的解决方案相同。但是,另一个可能是结合pandas.DataFrame.assign()
,pandas.Series.map()
,pandas.Series.value_counts()
和pandas.DataFrame.drop_duplicates()
的那个。
代码如下。
>>> df = df\
... .assign(Count = df['Cities'].map(df['Cities'].value_counts()))\
... .drop_duplicates()
>>> print(df)
booking_date Cities province Count
0 15-12-17 Kota Depok Jawa Barat 2
1 15-12-17 Bogor Jawa Barat 1
3 15-12-17 Kota Bandung Jawa Barat 2