有一个数据框,比如说
DF
Country Continent PopulationEst
0 Germany Europe 8.036970e+07
1 Canada North America 35.239865+07
...
我想创建一个日期框,显示每个国家/地区的估计人口数量(每个大洲的国家/地区数量),以及总和,平均值和标准偏差。
我做了以下事情:
df2 = df.groupby('Continent').agg(['size', 'sum','mean','std'])
但结果df2有多个级别列,如下所示:
df2.columns
MultiIndex(levels=[['PopulationEst'], ['size', 'sum', 'mean', 'std']],
labels=[[0, 0, 0, 0], [0, 1, 2, 3]])
如何从列中删除PopulationEst
,那么数据框只有['size', 'sum', 'mean', 'std']
列?
答案 0 :(得分:4)
我认为您需要添加['PopulationEst']
- agg
使用此列进行汇总:
df2 = df.groupby('Continent')['PopulationEst'].agg(['size', 'sum','mean','std'])
样品:
df = pd.DataFrame({
'Country': ['Germany', 'Germany', 'Canada', 'Canada'],
'PopulationEst': [8, 4, 35, 50],
'Continent': ['Europe', 'Europe', 'North America', 'North America']},
columns=['Country','PopulationEst','Continent'])
print (df)
Country PopulationEst Continent
0 Germany 8 Europe
1 Germany 4 Europe
2 Canada 35 North America
3 Canada 50 North America
df2 = df.groupby('Continent')['PopulationEst'].agg(['size', 'sum','mean','std'])
print (df2)
size sum mean std
Continent
Europe 2 12 6.0 2.828427
North America 2 85 42.5 10.606602
df2 = df.groupby('Continent').agg(['size', 'sum','mean','std'])
print (df2)
PopulationEst
size sum mean std
Continent
Europe 2 12 6.0 2.828427
North America 2 85 42.5 10.606602
另一个解决方案是使用MultiIndex.droplevel
:
df2 = df.groupby('Continent').agg(['size', 'sum','mean','std'])
df2.columns = df2.columns.droplevel(0)
print (df2)
size sum mean std
Continent
Europe 2 12 6.0 2.828427
North America 2 85 42.5 10.606602
答案 1 :(得分:0)
我认为这可以做你需要的事情:
grouping = {'Continent': ['size'], 'PopEst':['sum', 'mean', 'std']}
df.groupby('Continent').agg(grouping)