Question

目标是使用国名相交将三个数据集（GDP，能源工程和能源技术的能耗和排名数据）合并/合并到新数据集中。

必须使用最近10年（2006-2015年）的GDP数据，并且必须使用“排名”前15个国家（排名1至15）。

预期输出是来自其他数据集的具有各自属性的前15个国家/地区的输出。这是我在合并三个数据集时所尝试的：

df_new = pd.merge(ScimEn,energy,how='inner',left_on='Country',right_on='Country')

final_df = pd.merge(df_new,GDP,how='inner',left_on='Country',right_on='Country')
final_df = final_df.set_index('Country')

但是在输出中，我仅获得所需数据的一部分。

我相信清理这段特定代码中的数据是错误的：

energy=pd.read_excel('Energy Indicators.xls', skiprows=18,header=None,skipfooter=38, na_values='...')

energy.drop([0,1], axis=1,inplace=True)
energy.rename(columns={2:"Country",3: "Energy Supply",4: "Energy Supply per Capita",5: "% Renewable"}, inplace=True)
energy['Energy Supply']=energy['Energy Supply']*1000000
energy['Country']=energy['Country'].replace({"Republic of Korea": "South Korea","United States of America": "United States","United Kingdom of Great Britain and Northern Ireland": "United Kingdom","China, Hong Kong Special Administrative Region": "Hong Kong"})
energy['Country'] = energy['Country'].str.replace(r" \(.*\)","")

有什么帮助或建议吗？

将数据框与大熊猫合并时结果不完整

0 个答案: