在熊猫中的三个数据框上合并仅产生一行

时间:2019-07-19 23:19:31

标签: python-3.x pandas dataframe merge

我正在为Coursera的《数据科学导论》解决一个问题,而且我真的在如何获得答案所需的15行而不是与我一起工作的问题上苦苦挣扎。

数据集可以在这里找到:

能量:https://qkijypphmnsnwhitxjvalj.coursera-apps.org/notebooks/Energy%20Indicators.xls

GDP:http://data.worldbank.org/indicator/NY.GDP.MKTP.CD

ScimEn:http://www.scimagojr.com/countryrank.php?category=2102

基本上,我要做的是导入这些数据集,对其进行一些清理以使国家/地区匹配,然后生成仅反映ScimEn的前15行并且包含来自所有列的数据。这三个数据集中的每个。

这是我的代码:

import pandas as pd
import numpy as np

energy = pd.read_excel('Energy Indicators.xls',skiprows=17,skipfooter = 245,header = None)
energy = energy.drop([0, 1], axis=1).drop(0,axis = 0)
energy.columns = ['Country','Energy Supply', 'Energy Supply per Capita', '% Renewable']
energy['Country'] = energy['Country'].replace({'Australia1':'Australia','Bolivia (Plurinational State of)':'Bolivia','China2':'China','Democratic Republic of the Congo':'Congo','Denmark5':'Denmark','Falkland Islands (Malvinas)':'Falkland Islands','France6':'France','Greenland7':'Greenland','China, Hong Kong Special Administrative Region3':'Hong Kong','Indonesia8':'Indonesia','Iran (Islamic Republic of)':'Iran','Italy9':'Italy','Japan10':'Japan','Kuwait11':'Kuwait','Lao People\'s Democratic Republic':'Laos','China, Macao Special Administrative Region4':'Macao','Micronesia (Federated States of)':'Micronesia','Republic of Moldova':'Moldova','Netherlands12':'Netherlands','Democratic People\'s Republic of Korea':'North Korea','Portugal13':'Portugal','Réunion':'Reunion','Saudi Arabia14':'Saudi Arabia','Serbia15':'Serbia','Sint Maarten (Dutch part)':'Sint Maarten','Republic of Korea':'South Korea','Spain16':'Spain','Switzerland17':'Switzerland','Syrian Arab Republic':'Syria','Ukraine18':'Ukraine','United Kingdom of Great Britain and Northern Ireland19':'United Kingdom','United States of America20':'United States','Venezuela (Bolivarian Republic of)':'Venezuela','The former Yugoslav Republic of Macedonia':'Yugoslavia'})
energy['Energy Supply'] = energy['Energy Supply'].replace({'...': np.nan})
energy['Energy Supply per Capita'] = energy['Energy Supply per Capita'].replace({'...': np.nan})
energy['Energy Supply'] = energy['Energy Supply'] * 1000000

GDP = pd.read_csv('world_bank.csv',skiprows = 4)
GDP = GDP[['Country Name','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015']]
GDP.columns = ['Country','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015']
GDP['Country']= GDP['Country'].replace('Korea, Rep.','South Korea')
GDP['Country']= GDP['Country'].replace('Hong Kong SAR, China','Hong Kong')
GDP['Country']= GDP['Country'].replace('Iran, Islamic Rep.','Iran')

ScimEn = pd.read_excel('scimagojr-3.xlsx')
ScimEn = ScimEn[['Rank', 'Country', 'Documents', 'Citable documents', 'Citations', 'Self-citations', 'Citations per document', 'H index']]

ScimEn = ScimEn[:15]

new = pd.merge(energy, GDP, how="inner", left_on="Country", right_on="Country")
new = pd.merge(new, ScimEn, how="inner", left_on="Country", right_on="Country")
#new = new.sort_values('Rank',ascending=True)

print(new)

不幸的是,此代码仅产生一行,即澳大利亚:

index   Country  Energy Supply  Energy Supply per Capita % Renewable 2006          2007          2008          2009          2010   2011  2013 2014          2015  Rank Documents  Citable documents  Citations  Self-citations  Citations per document  H  
0  Australia   5.386000e+09                     231.0     11.8108 1.021939e+12  1.060340e+12  1.099644e+12  1.119654e+12  1.142251e+12 1.169431e+12   ...     1.241484e+12  1.272520e+12  1.301251e+12    14  8831               8725      90765           15606  10.28      107 

由于我检查了其他GitHub存储库,而且我的存储库看起来与我的存储库非常相似,因此我不确定这是哪里出了错,所以我不确定为什么只得到一行。

非常感谢您的帮助。

0 个答案:

没有答案