在pandas数据帧中堆栈列以实现记录格式

时间:2016-01-03 02:05:05

标签: python pandas

我有一个数据框,第一列是国家名称,接下来的12列是年度gdp数字(列标题为'1999','2000','2001'等):

import pandas as pd
gdp = pd.read_csv('gdp.csv')
gdp.head()
  Country Name        1999        2000        2001         2002         2003  \
0        Aruba  1722798883  1873452514  1920262570   1941094972   2021301676
1      Andorra  1239840270  1401694156  1484004617   1717563533   2373836214 
2  Afghanistan         NaN         NaN  2461666315   4128818042   4583648922
3       Angola  6152936539  9129634978  8936063723  15285594828  17812704825
4      Albania  3414760915  3632043908  4060758804   4435078648   5746945913

         2004         2005         2006         2007         2008  \
0  2228279330   2331005587   2421474860   2623726257   2791960894
1  2916913449   3248134607   3536451646   4010785102   4001349340
2  5285461999   6275076016   7057598407   9843842455  10190529882
3 23552047248  36970918699  52381006892  65266452081  88538611205
4  7314865176   8158548717   8992642349  10701011896  12881352688

         2009         2010
0  2498932961   2467703911
1  3649863493   3346317329
2 12486943506  15936800636
3 73157893410  83369475451
4 12044212904  11926953259

我如何堆叠表格,以便我有一列用于国名,一列用于年份,一列用于gdp数字?到目前为止,这是我的代码:

gdp_s = gdp.stack()
gdp_s.head(20)

导致:

0  Country Name           Aruba
   1999            1.722799e+09
   2000            1.873453e+09
   2001            1.920263e+09
   2002            1.941095e+09
   2003            2.021302e+09
   2004            2.228279e+09
   2005            2.331006e+09
   2006            2.421475e+09
   2007            2.623726e+09
   2008            2.791961e+09
   2009            2.498933e+09
   2010            2.467704e+09
1  Country Name         Andorra
   1999             1.23984e+09
   2000            1.401694e+09
   2001            1.484005e+09
   2002            1.717564e+09
   2003            2.373836e+09
   2004            2.916913e+09
dtype: object

最终我正在寻找类似的东西:

Country Name    Year    GDP
Aruba           1999    1.722799e+09
Aruba           2000    1.873453e+09
Aruba           2001    1.920263e+09
Aruba           2002    1.941095e+09    
etc...

显然我是python和pandas的新手。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:3)

您可以使用pd.melt然后使用sort_values

>>> d2 = pd.melt(df, id_vars="Country Name", var_name="Year", value_name="GDP")
>>> d2 = d2.sort_values(["Country Name", "Year"]).reset_index(drop=True)
>>> d2.head(10)
  Country Name  Year         GDP
0  Afghanistan  1999         NaN
1  Afghanistan  2000         NaN
2  Afghanistan  2001  2461666315
3  Afghanistan  2002  4128818042
4  Afghanistan  2003  4583648922
5      Albania  1999  3414760915
6      Albania  2000  3632043908
7      Albania  2001  4060758804
8      Albania  2002  4435078648
9      Albania  2003  5746945913