在python中创建额外级别的标头(pandas)

时间:2017-08-02 09:39:57

标签: python python-2.7 pandas dataframe multi-index

我是编程新手,但目前正在使用数据帧。我试图将当前的数据帧堆叠到特定的设计"。目前我正在处理更大的文件,有很多数据。但是,根据我的意愿,我无法堆叠()我的数据,并且形状完全混乱。我需要帮助如何定义多索引,创建更多级别。

我希望你能帮助我,我贴了一个例子enter image description here

我从我的代码中得到的结果(在stack()之前):

    Exports      NaN      NaN      NaN      Net Exports       NaN      NaN  
0      Total   Sweden   Norway  Germany        Total   Sweden   Norway    
1     1032.8      358    239.7    435.1        636.8    274.1      9.7   
2     1198.8    556.4    211.8    430.6        846.3    522.6     -1.1   `

with stack():

     Exports            Total
     NaN               Sweden
     NaN               Norway
     NaN              Germany
     Net Exports        Total
     NaN               Sweden
     NaN               Norway
     NaN              Germany
     NaN                  GWh
1    Exports           1032.8
     NaN                  358
     NaN                239.7
     NaN                435.1
     Net Exports        636.8
     NaN                274.1
     NaN                  9.7
     NaN                  353

提前感谢您帮助我

1 个答案:

答案 0 :(得分:1)

我认为你需要:

print (r.head())
    Unnamed: 18 Unnamed: 19 Unnamed: 20 Unnamed: 21   Unnamed: 22 Unnamed: 23  \
0       Exports         NaN         NaN         NaN  Net Exports          NaN   
2         Total      Sweden      Norway     Germany         Total      Sweden   
189      1032.8         358       239.7       435.1         636.8       274.1   
190      1198.8       556.4       211.8       430.6         846.3       522.6   
191       982.7       159.3       166.2       657.2         276.3      -156.8   

    Unnamed: 24 Unnamed: 25     Unit:  
0           NaN         NaN       NaN  
2        Norway     Germany       GWh  
189         9.7         353   January  
190        -1.1       324.8  February  
191      -105.9         539     March  
#create index from column Unit 
r = r.set_index('Unit:')
#create Multiindex from first and second row
#NaNs in frst row was replace by ffill - forward filling fillna()
r.columns= pd.MultiIndex.from_arrays([r.iloc[0].ffill(), r.iloc[1]], names=(None, None))
#remove first and second row
r = r.iloc[2:]

print (r.head())
         Exports                       Net Exports                       
           Total Sweden Norway Germany        Total Sweden Norway Germany
Unit:                                                                    
January   1032.8    358  239.7   435.1        636.8  274.1    9.7     353
February  1198.8  556.4  211.8   430.6        846.3  522.6   -1.1   324.8
March      982.7  159.3  166.2   657.2        276.3 -156.8 -105.9     539
April      962.3   22.1     62   878.2       -268.6 -741.3 -352.9   825.6
May        951.2   13.5   15.9   921.8       -511.5 -885.2 -496.4   870.1

print (r.stack().head(10))
                 Exports Net Exports 
Unit:                                
January  Germany   435.1          353
         Norway    239.7          9.7
         Sweden      358        274.1
         Total    1032.8        636.8
February Germany   430.6        324.8
         Norway    211.8         -1.1
         Sweden    556.4        522.6
         Total    1198.8        846.3
March    Germany   657.2          539
         Norway    166.2       -105.9