我正在为Coursera的《数据科学导论》解决一个问题,而且我真的在如何获得答案所需的15行而不是与我一起工作的问题上苦苦挣扎。
数据集可以在这里找到:
能量:https://qkijypphmnsnwhitxjvalj.coursera-apps.org/notebooks/Energy%20Indicators.xls
GDP:http://data.worldbank.org/indicator/NY.GDP.MKTP.CD
ScimEn:http://www.scimagojr.com/countryrank.php?category=2102
基本上,我要做的是导入这些数据集,对其进行一些清理以使国家/地区匹配,然后生成仅反映ScimEn的前15行并且包含来自所有列的数据。这三个数据集中的每个。
这是我的代码:
import pandas as pd
import numpy as np
energy = pd.read_excel('Energy Indicators.xls',skiprows=17,skipfooter = 245,header = None)
energy = energy.drop([0, 1], axis=1).drop(0,axis = 0)
energy.columns = ['Country','Energy Supply', 'Energy Supply per Capita', '% Renewable']
energy['Country'] = energy['Country'].replace({'Australia1':'Australia','Bolivia (Plurinational State of)':'Bolivia','China2':'China','Democratic Republic of the Congo':'Congo','Denmark5':'Denmark','Falkland Islands (Malvinas)':'Falkland Islands','France6':'France','Greenland7':'Greenland','China, Hong Kong Special Administrative Region3':'Hong Kong','Indonesia8':'Indonesia','Iran (Islamic Republic of)':'Iran','Italy9':'Italy','Japan10':'Japan','Kuwait11':'Kuwait','Lao People\'s Democratic Republic':'Laos','China, Macao Special Administrative Region4':'Macao','Micronesia (Federated States of)':'Micronesia','Republic of Moldova':'Moldova','Netherlands12':'Netherlands','Democratic People\'s Republic of Korea':'North Korea','Portugal13':'Portugal','Réunion':'Reunion','Saudi Arabia14':'Saudi Arabia','Serbia15':'Serbia','Sint Maarten (Dutch part)':'Sint Maarten','Republic of Korea':'South Korea','Spain16':'Spain','Switzerland17':'Switzerland','Syrian Arab Republic':'Syria','Ukraine18':'Ukraine','United Kingdom of Great Britain and Northern Ireland19':'United Kingdom','United States of America20':'United States','Venezuela (Bolivarian Republic of)':'Venezuela','The former Yugoslav Republic of Macedonia':'Yugoslavia'})
energy['Energy Supply'] = energy['Energy Supply'].replace({'...': np.nan})
energy['Energy Supply per Capita'] = energy['Energy Supply per Capita'].replace({'...': np.nan})
energy['Energy Supply'] = energy['Energy Supply'] * 1000000
GDP = pd.read_csv('world_bank.csv',skiprows = 4)
GDP = GDP[['Country Name','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015']]
GDP.columns = ['Country','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015']
GDP['Country']= GDP['Country'].replace('Korea, Rep.','South Korea')
GDP['Country']= GDP['Country'].replace('Hong Kong SAR, China','Hong Kong')
GDP['Country']= GDP['Country'].replace('Iran, Islamic Rep.','Iran')
ScimEn = pd.read_excel('scimagojr-3.xlsx')
ScimEn = ScimEn[['Rank', 'Country', 'Documents', 'Citable documents', 'Citations', 'Self-citations', 'Citations per document', 'H index']]
ScimEn = ScimEn[:15]
new = pd.merge(energy, GDP, how="inner", left_on="Country", right_on="Country")
new = pd.merge(new, ScimEn, how="inner", left_on="Country", right_on="Country")
#new = new.sort_values('Rank',ascending=True)
print(new)
不幸的是,此代码仅产生一行,即澳大利亚:
index Country Energy Supply Energy Supply per Capita % Renewable 2006 2007 2008 2009 2010 2011 2013 2014 2015 Rank Documents Citable documents Citations Self-citations Citations per document H
0 Australia 5.386000e+09 231.0 11.8108 1.021939e+12 1.060340e+12 1.099644e+12 1.119654e+12 1.142251e+12 1.169431e+12 ... 1.241484e+12 1.272520e+12 1.301251e+12 14 8831 8725 90765 15606 10.28 107
由于我检查了其他GitHub存储库,而且我的存储库看起来与我的存储库非常相似,因此我不确定这是哪里出了错,所以我不确定为什么只得到一行。
非常感谢您的帮助。