pandas:在excel表中循环遍历表

时间:2016-04-11 18:18:10

标签: excel loops pandas multi-index

我试图以特定的方式遍历一组牌桌,但我被卡住了。

我的表格是多索引的,看起来像这样:

#read excel 
df = pd.read_excel(data_file,
                   header=[0,1],
                   index_col=[0,1])

                           T        Gender                  Age                         
                         Total   Male Female 16-24 25-34 35-44 45-54 55-75 
Q1. Are you?  Yes         17.5   26.8   23.4  13.7  20.7   100     -  17.6    
              No          17.5   26.8   23.4  13.7  20.7   100  11.5  22.6 
              Don’t know  17.5   26.8   23.4  13.7  20.7   100     -     -
Q2. Are you?  Yes         18.5   26.8   23.4  13.7  20.7   100     -  17.6    
              No          17.5   22.8   23.4  13.7  20.7   100  11.5  22.6 
              Don’t know  17.5   26.8   23.4  13.7  20.7   100     -     -

我想循环遍历这些索引和列并打印出来:

                           T                             
                          Total   
Q1. Are you?  Yes         17.5    
              No          17.5  
              Don’t know  17.5 

                            Gender                                              
                          Male Female 
Q1. Are you?  Yes         26.8   23.4  
              No          26.8   23.4  
              Don’t know  26.8   23.4 

等等....

到目前为止,我的代码将外部索引组合在一起,这使得我可以向下循环,但我不知道如何横向工作......?

for outerside_grp, innerside_grp in df.groupby(level=0):
    print innerside_grp 

更新

下面的代码有点做我想要的(感谢Joshua Baboo),但现在我想知道它是否是最有效的方法?

for key in df.index.levels[0]:
    for col in df.columns.levels[0]:
        print df.loc[row:row, col]

1 个答案:

答案 0 :(得分:1)

正如你所说:

  

'我的表是多索引'

假设不需要groupby(level=0),因为原始数据帧在行和放大器上都是2级MultiIndex结构。列轴,查看以下示例是否为您的目的服务:

import pandas as pd
print 'pandas-version: ', pd.__version__
import numpy a`enter code here`s np
l1 = ['r0_1', 'r0_2']
l2 = sorted(['r1_1','r1_2','r1_3'])
c1 = ['c0_1', 'c0_2', 'c0_3']
c2 = ['c1_1', 'c1_2', 'c1_3']
nrows = len(l1) * len(l2)
ncols = len(c1) * len(c2)
df = pd.DataFrame(np.random.random( nrows * ncols).reshape(nrows, ncols),
                 index=pd.MultiIndex.from_product([l1, l2],
                                                 names=['one','two']),
                 columns=pd.MultiIndex.from_product([c1, c2]))
l_all = slice(None)

# updated loop only over columns.level[0]
# to get all-rows for each column group
for col0 in df.columns.levels[0]:
    print df.loc(axis=1)[col0,:]

输出

pandas-version:  0.15.2
               c0_1                    
               c1_1      c1_2      c1_3
one  two                               
r0_1 r1_1  0.177051  0.159676  0.677900
     r1_2  0.980404  0.441649  0.763252
     r1_3  0.631876  0.724937  0.158891
r0_2 r1_1  0.856933  0.432360  0.690534
     r1_2  0.568308  0.381117  0.430071
     r1_3  0.680781  0.795433  0.378414
               c0_2                    
               c1_1      c1_2      c1_3
one  two                               
r0_1 r1_1  0.275005  0.266315  0.326656
     r1_2  0.841370  0.197737  0.215751
     r1_3  0.511860  0.007003  0.509688
r0_2 r1_1  0.170542  0.577844  0.616402
     r1_2  0.440131  0.497631  0.628281
     r1_3  0.061970  0.192166  0.687346
               c0_3                    
               c1_1      c1_2      c1_3
one  two                               
r0_1 r1_1  0.308490  0.372552  0.275818
     r1_2  0.718901  0.784083  0.839253
     r1_3  0.357739  0.821503  0.336578
r0_2 r1_1  0.758157  0.248164  0.983741
     r1_2  0.498885  0.972781  0.922519
     r1_3  0.107162  0.364109  0.591648

ref for .loc(axis=0)