我有以下csv文件
在DataFrame中使用python读取之后,我想将其重塑为
国家,YEAR1,价值
国,YEAR2,值
仅考虑国家的年份和价值。所以我最终会得到3个变量。
您可以在此处找到数据集:
http://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG
答案 0 :(得分:3)
您可以使用:
read_csv
,skiprows
isnull
与all
iloc
来选择第一个61
列set_index
unstack
reset_index
rename
列dict
df = pd.read_csv('API_NY.GDP.MKTP.KD.ZG_DS2_en_csv_v2.csv', skiprows=4)
print (df.head())
Country Name Country Code Indicator Name Indicator Code 1960 \
0 Aruba ABW GDP growth (annual %) NY.GDP.MKTP.KD.ZG NaN
1 Afghanistan AFG GDP growth (annual %) NY.GDP.MKTP.KD.ZG NaN
2 Angola AGO GDP growth (annual %) NY.GDP.MKTP.KD.ZG NaN
3 Albania ALB GDP growth (annual %) NY.GDP.MKTP.KD.ZG NaN
4 Andorra AND GDP growth (annual %) NY.GDP.MKTP.KD.ZG NaN
1961 1962 1963 1964 1965 ... 2008 2009 2010 \
0 NaN NaN NaN NaN NaN ... -6.881302 -5.653502 NaN
1 NaN NaN NaN NaN NaN ... 3.611368 21.020649 8.433290
2 NaN NaN NaN NaN NaN ... 13.817146 2.412870 3.407655
3 NaN NaN NaN NaN NaN ... 7.530000 3.350000 3.710000
4 NaN NaN NaN NaN NaN ... -8.594256 -3.817986 -5.347977
2011 2012 2013 2014 2015 2016 Unnamed: 61
0 NaN NaN NaN NaN NaN NaN NaN
1 6.113685 14.434741 1.959123 1.312531 1.112558 2.232272 NaN
2 3.918597 5.155441 6.813586 4.804473 3.006981 0.000000 NaN
3 2.550000 1.420000 1.110000 1.800000 2.590000 3.460000 NaN
4 -4.802675 -1.760010 -0.063514 NaN NaN NaN NaN
cols = ['Country Name','Country Code','Indicator Name','Indicator Code']
print (df.iloc[:, 61].isnull().all())
True
d = {'level_4':'year'}
df = df.iloc[:, :60]
.set_index(cols)
.stack()
.reset_index(name='vals')
.rename(columns=d)
print (df.head())
Country Name Country Code Indicator Name Indicator Code year \
0 Aruba ABW GDP growth (annual %) NY.GDP.MKTP.KD.ZG 1995
1 Aruba ABW GDP growth (annual %) NY.GDP.MKTP.KD.ZG 1996
2 Aruba ABW GDP growth (annual %) NY.GDP.MKTP.KD.ZG 1997
3 Aruba ABW GDP growth (annual %) NY.GDP.MKTP.KD.ZG 1998
4 Aruba ABW GDP growth (annual %) NY.GDP.MKTP.KD.ZG 1999
vals
0 1.245086
1 7.814432
2 6.666622
3 1.154469
4 4.514062
对于3列非常相似的解决方案,对于删除列使用drop
:
df = pd.read_csv('API_NY.GDP.MKTP.KD.ZG_DS2_en_csv_v2.csv', skiprows=4)
d = {'level_1':'year'}
df= df.drop(['Country Code','Indicator Name','Indicator Code', 'Unnamed: 61'], axis=1)
df = df.set_index('Country Name').stack().reset_index(name='vals').rename(columns=d)
print (df.head())
Country Name year vals
0 Aruba 1995 1.245086
1 Aruba 1996 7.814432
2 Aruba 1997 6.666622
3 Aruba 1998 1.154469
4 Aruba 1999 4.514062
答案 1 :(得分:2)
你可以试试这个:
object ScalaMain {
def main(args: Array[String]) = {
var vec = new Vector2D(1, 2);
println(vec.x);
}
}