Question

我有一个pandas数据框，它具有各个层次级别的月度计数。它是长格式，我想通过聚合转换为宽格式。

It is of the following format:
date | country | state | population | Vitals
01-01| cc1     | s1    | 5  |  20 
01-01| cc1     | s2    | 4  |  20
01-01| cc2    | s3    | 10 | 35
01-01| cc2     | s4   | 11 | 35
01-01| cc3    | s5    | 12 | 20
01-01| cc3     | s6    | 12 | 20
02-01| cc1     | s1    | 6 | 25 
02-01| cc1     | s2    | 5 | 25
02-01| cc2    | s3     | 11 | 40
02-01| cc2    | s4     | 12 |40
02-01| cc3    | s5     | 11 | 40
02-01| cc3    | s6     | 12 |40


I want to transform this into the following format:
date | population | vital sums
01-01| 54         | 75
02-01| 57         | 105

基本上，总体是相加的（随时间分组）。重要信息会按日期和国家/地区分组，然后针对特定国家/地区进行汇总。有什么办法进行这种汇总吗？

编辑：可以通过.agg（）

完成吗

Answer 1

您可以为sum汇总population，然后依次删除DataFrame.drop_duplicates，sum和concat的重复项：

s1 = df.groupby('date')['population'].sum()
s2 = df.drop_duplicates(['date','country','Vitals']).groupby('date')['Vitals'].sum()

如果有MultiIndex：

s1 = df.groupby('date')['population'].sum()
s2 = df.groupby(['date','country','Vitals'])['Vitals'].first().groupby('date').sum()

df = pd.concat([s1, s2], axis=1)
print (df)
       population  Vitals
date                     
01-01          54      75
02-01          57     105

编辑：

如果date列中每个country和Vital组合的值分别为GroupBy.agg，然后按{{1}的第一级依次为sum，则为另一解决方案}：

MultiIndex

大熊猫时间序列聚合

1 个答案: