我有一张表格,其中包含15个国家的各种信息(例如能源供应,可再生能源供应比例)。我必须创建一个数据框,其中包含非洲大陆各国的数据,以及各大洲各国的平均数,标准差和人口总数。数据帧由上述表格的数据组成。我的问题是,在将15个国家映射到各自的大陆后,我似乎无法汇总大陆层面的数据。我必须使用预定义的字典来解决此任务。你能帮帮我吗?请在下面找到我的代码:
def answer_eleven():
import numpy as np
import pandas as pd
Top15 = answer_one()
Top15['Country Name'] = Top15.index
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
Top15['Continent'] = pd.Series(ContinentDict)
#Top15['size'] = Top15['Country'].count()
Top15['Population'] = (Top15['Energy Supply'] / Top15['Energy Supply per Capita'])
#columns_to_keep = ['Continent', 'Population']
#Top15 = Top15[columns_to_keep]
#Top15 = Top15.set_index('Continent').groupby(level=0)['Population'].agg({'sum': np.sum})
Top15.set_index(['Continent'], inplace = True)
Top15['size'] = Top15.groupby(['Continent'])['Country Name'].count()
Top15['sum'] = Top15.groupby(['Continent'])['Population'].sum()
Top15['mean'] = Top15.groupby(['Continent'])['Population'].mean()
Top15['std'] = Top15.groupby(['Continent'])['Population'].std()
columns_to_keep = ['size', 'sum', 'mean', 'std']
Top15 = Top15[columns_to_keep]
#Top15['Continent Name'] = Top15.index
#Top15.groupby(['Continent'], level = 0, sort = True)['size'].count()
return Top15.iloc[:5]
answer_eleven()
答案 0 :(得分:0)
我相信你需要agg
来汇总字典:
def answer_eleven():
Top15 = answer_one()
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
Top15['Population'] = (Top15['Energy Supply'] / Top15['Energy Supply per Capita'])
Top15 = Top15.groupby(ContinentDict)['Population'].agg(['size','sum','mean','std'])
return Top15
df = answer_eleven()
print (df)
sum mean std size
Country Name
Asia 2.771785e+09 9.239284e+08 6.913019e+08 3
Australia 2.331602e+07 2.331602e+07 NaN 1
Europe 4.579297e+08 7.632161e+07 3.464767e+07 6
North America 3.528552e+08 1.764276e+08 1.996696e+08 2
South America 2.059153e+08 2.059153e+08 NaN 1