在python中转动数据帧

时间:2017-07-09 10:48:01

标签: python pandas pivot

我有以下csv文件

enter image description here

在DataFrame中使用python读取之后,我想将其重塑为
国家,YEAR1,价值
国,YEAR2,值

仅考虑国家的年份和价值。所以我最终会得到3个变量。

您可以在此处找到数据集:
http://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG

2 个答案:

答案 0 :(得分:3)

您可以使用:

  • read_csvskiprows
  • 省略前4行
  • 检查最后一列是否具有isnullall
  • 之后的所有NaN值
  • 所有值都是NaN,因此您可以省略它,一个可能的解决方案是使用iloc来选择第一个61
  • 使用set_index
  • 重新unstack
  • 最后reset_index renamedict
df = pd.read_csv('API_NY.GDP.MKTP.KD.ZG_DS2_en_csv_v2.csv', skiprows=4)
print (df.head())
  Country Name Country Code         Indicator Name     Indicator Code  1960  \
0        Aruba          ABW  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG   NaN   
1  Afghanistan          AFG  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG   NaN   
2       Angola          AGO  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG   NaN   
3      Albania          ALB  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG   NaN   
4      Andorra          AND  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG   NaN   

   1961  1962  1963  1964  1965     ...            2008       2009      2010  \
0   NaN   NaN   NaN   NaN   NaN     ...       -6.881302  -5.653502       NaN   
1   NaN   NaN   NaN   NaN   NaN     ...        3.611368  21.020649  8.433290   
2   NaN   NaN   NaN   NaN   NaN     ...       13.817146   2.412870  3.407655   
3   NaN   NaN   NaN   NaN   NaN     ...        7.530000   3.350000  3.710000   
4   NaN   NaN   NaN   NaN   NaN     ...       -8.594256  -3.817986 -5.347977   

       2011       2012      2013      2014      2015      2016  Unnamed: 61  
0       NaN        NaN       NaN       NaN       NaN       NaN          NaN  
1  6.113685  14.434741  1.959123  1.312531  1.112558  2.232272          NaN  
2  3.918597   5.155441  6.813586  4.804473  3.006981  0.000000          NaN  
3  2.550000   1.420000  1.110000  1.800000  2.590000  3.460000          NaN  
4 -4.802675  -1.760010 -0.063514       NaN       NaN       NaN          NaN
cols = ['Country Name','Country Code','Indicator Name','Indicator Code']

print (df.iloc[:, 61].isnull().all())
True    

d = {'level_4':'year'}
df = df.iloc[:, :60]
       .set_index(cols)
       .stack()
       .reset_index(name='vals')
       .rename(columns=d)
print (df.head())
  Country Name Country Code         Indicator Name     Indicator Code  year  \
0        Aruba          ABW  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG  1995   
1        Aruba          ABW  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG  1996   
2        Aruba          ABW  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG  1997   
3        Aruba          ABW  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG  1998   
4        Aruba          ABW  GDP growth (annual %)  NY.GDP.MKTP.KD.ZG  1999   

       vals  
0  1.245086  
1  7.814432  
2  6.666622  
3  1.154469  
4  4.514062

对于3列非常相似的解决方案,对于删除列使用drop

df = pd.read_csv('API_NY.GDP.MKTP.KD.ZG_DS2_en_csv_v2.csv', skiprows=4)

d = {'level_1':'year'}
df= df.drop(['Country Code','Indicator Name','Indicator Code', 'Unnamed: 61'], axis=1)
df = df.set_index('Country Name').stack().reset_index(name='vals').rename(columns=d)
print (df.head())
  Country Name  year      vals
0        Aruba  1995  1.245086
1        Aruba  1996  7.814432
2        Aruba  1997  6.666622
3        Aruba  1998  1.154469
4        Aruba  1999  4.514062

答案 1 :(得分:2)

你可以试试这个:

object ScalaMain {
    def main(args: Array[String]) = {
        var vec = new Vector2D(1, 2);
        println(vec.x);
    }
}