如何为这个pandas数据帧提供重复的第三个索引级别?

时间:2017-06-27 13:14:10

标签: pandas dataframe indexing

我有一个像这样的pandas数据框:

symbol        ABHD17C    ADAM19    ADAMTS1    ADAMTS1        ADM        ADM  
Control 15   8.118767  7.418533  10.352104  11.489224  10.868479  11.214037   
        15   8.035596  7.623869  10.482607  11.618165  10.569137  11.186141   
        15   8.066988  7.480469  10.274919  11.554862  10.744955  11.225492   
        15   8.047028  7.547017  10.318972  11.499156  10.831416  11.203028   
        15   8.153588  7.507617  10.372795  11.526810  10.844520  11.125373   
        15   7.888492  7.494270  10.295306  11.542063  10.750984  11.157360   
        30   8.021081  7.417150  10.266634  11.481706  10.685487  11.060077   
        30   8.111274  7.430963  10.209418  11.426212  10.662505  11.097215   
        30   8.052534  7.359658  10.115304  11.432636  10.597524  11.004115   
        30   8.010348  7.391775  10.266453  11.546372  10.572780  11.011062   
        30   8.012789  7.420152  10.303099  11.472436  10.708051  11.084203   
        30   8.074147  7.331148  10.273618  11.430159  10.649350  11.144293   
        60   7.989894  7.419103  10.248334  11.345593  10.541806  10.926970   
        60   8.077109  7.417790  10.090361  11.289033  10.458097  10.958197   
        60   7.983865  7.492862  10.129944  11.418077  10.522366  10.873455   
        60   8.053574  7.441354  10.138749  11.297438  10.547018  10.942314   
        60   8.044626  7.389152  10.128528  11.328647  10.503515  10.932906   
        60   7.966093  7.487729  10.141745  11.406319  10.533793  10.982887   
        90   8.164200  7.610294   9.828901  11.023355  10.427788  10.902193   
        90   8.130045  7.425974   9.972879  11.163064  10.339801  10.840847   
        90   8.208286  7.422047   9.754889  11.120437  10.411414  10.935003   
        90   7.935916  7.324434   9.808280  10.977562  10.251636  10.694670   
        90   8.226764  7.399737   9.760450  11.015849  10.545208  10.911852   
        90   8.211148  7.627093   9.841167  11.172223  10.446194  10.801106 

第二个索引列是时间(15,30,60和90分钟),具有相同索引的行是单独的重复。我需要给这个数据帧的索引另一个级别来指示重复,但无法弄清楚如何做到这一点。有人可以给我一些建议吗?

这就是我希望输出的样子:

symbol        ABHD17C    ADAM19    ADAMTS1    ADAMTS1        ADM        ADM  \
Control 15  1 8.118767  7.418533  10.352104  11.489224  10.868479  11.214037   
        15  2 8.035596  7.623869  10.482607  11.618165  10.569137  11.186141   
        15  3 8.066988  7.480469  10.274919  11.554862  10.744955  11.225492   
        15  4 8.047028  7.547017  10.318972  11.499156  10.831416  11.203028   
        15  5 8.153588  7.507617  10.372795  11.526810  10.844520  11.125373   
        15  6 7.888492  7.494270  10.295306  11.542063  10.750984  11.157360   
        30  1 8.021081  7.417150  10.266634  11.481706  10.685487  11.060077   
        30  2 8.111274  7.430963  10.209418  11.426212  10.662505  11.097215   
        30  3 8.052534  7.359658  10.115304  11.432636  10.597524  11.004115   
        30  4 8.010348  7.391775  10.266453  11.546372  10.572780  11.011062   
        30  5 8.012789  7.420152  10.303099  11.472436  10.708051  11.084203   
        30  6 8.074147  7.331148  10.273618  11.430159  10.649350  11.144293   
        60  1 7.989894  7.419103  10.248334  11.345593  10.541806  10.926970   
        60  2 8.077109  7.417790  10.090361  11.289033  10.458097  10.958197   
        60  3 7.983865  7.492862  10.129944  11.418077  10.522366  10.873455   
        60  4 8.053574  7.441354  10.138749  11.297438  10.547018  10.942314   
        60  5 8.044626  7.389152  10.128528  11.328647  10.503515  10.932906   
        60  6 7.966093  7.487729  10.141745  11.406319  10.533793  10.982887   
        90  1 8.164200  7.610294   9.828901  11.023355  10.427788  10.902193   
        90  2 8.130045  7.425974   9.972879  11.163064  10.339801  10.840847   
        90  3 8.208286  7.422047   9.754889  11.120437  10.411414  10.935003   
        90  4 7.935916  7.324434   9.808280  10.977562  10.251636  10.694670   
        90  5 8.226764  7.399737   9.760450  11.015849  10.545208  10.911852   
        90  6 8.211148  7.627093   9.841167  11.172223  10.446194  10.801106 

1 个答案:

答案 0 :(得分:3)

我认为cumcount需要set_index

df['rep'] = df.groupby(level=[0,1]).cumcount() + 1
df = df.set_index('rep', append=True)
print (df)
                 ABHD17C    ADAM19    ADAMTS1    ADAMTS1        ADM        ADM
symbol     rep                                                                
Control 15 1    8.118767  7.418533  10.352104  11.489224  10.868479  11.214037
           2    8.035596  7.623869  10.482607  11.618165  10.569137  11.186141
           3    8.066988  7.480469  10.274919  11.554862  10.744955  11.225492
           4    8.047028  7.547017  10.318972  11.499156  10.831416  11.203028
           5    8.153588  7.507617  10.372795  11.526810  10.844520  11.125373
           6    7.888492  7.494270  10.295306  11.542063  10.750984  11.157360
        30 1    8.021081  7.417150  10.266634  11.481706  10.685487  11.060077
           2    8.111274  7.430963  10.209418  11.426212  10.662505  11.097215
           3    8.052534  7.359658  10.115304  11.432636  10.597524  11.004115
           4    8.010348  7.391775  10.266453  11.546372  10.572780  11.011062
           5    8.012789  7.420152  10.303099  11.472436  10.708051  11.084203
           6    8.074147  7.331148  10.273618  11.430159  10.649350  11.144293
        60 1    7.989894  7.419103  10.248334  11.345593  10.541806  10.926970
           2    8.077109  7.417790  10.090361  11.289033  10.458097  10.958197
           3    7.983865  7.492862  10.129944  11.418077  10.522366  10.873455
           4    8.053574  7.441354  10.138749  11.297438  10.547018  10.942314
           5    8.044626  7.389152  10.128528  11.328647  10.503515  10.932906
           6    7.966093  7.487729  10.141745  11.406319  10.533793  10.982887
        90 1    8.164200  7.610294   9.828901  11.023355  10.427788  10.902193
           2    8.130045  7.425974   9.972879  11.163064  10.339801  10.840847
           3    8.208286  7.422047   9.754889  11.120437  10.411414  10.935003
           4    7.935916  7.324434   9.808280  10.977562  10.251636  10.694670
           5    8.226764  7.399737   9.760450  11.015849  10.545208  10.911852
           6    8.211148  7.627093   9.841167  11.172223  10.446194  10.801106

但是如果从来没有遗漏某些行,如果前两个级别都是长度为6的组合,则可以使用MultiIndex.from_product

a = df.index.levels[0]
b = df.index.levels[1]
c = [1,2,3,4,5,6]
df.index = pd.MultiIndex.from_product([a,b,c], names=('symbol','a','b'))
print (df)
               ABHD17C    ADAM19    ADAMTS1    ADAMTS1        ADM        ADM
symbol  a  b                                                                
Control 15 1  8.118767  7.418533  10.352104  11.489224  10.868479  11.214037
           2  8.035596  7.623869  10.482607  11.618165  10.569137  11.186141
           3  8.066988  7.480469  10.274919  11.554862  10.744955  11.225492
           4  8.047028  7.547017  10.318972  11.499156  10.831416  11.203028
           5  8.153588  7.507617  10.372795  11.526810  10.844520  11.125373
           6  7.888492  7.494270  10.295306  11.542063  10.750984  11.157360
        30 1  8.021081  7.417150  10.266634  11.481706  10.685487  11.060077
           2  8.111274  7.430963  10.209418  11.426212  10.662505  11.097215
           3  8.052534  7.359658  10.115304  11.432636  10.597524  11.004115
           4  8.010348  7.391775  10.266453  11.546372  10.572780  11.011062
           5  8.012789  7.420152  10.303099  11.472436  10.708051  11.084203
           6  8.074147  7.331148  10.273618  11.430159  10.649350  11.144293
        60 1  7.989894  7.419103  10.248334  11.345593  10.541806  10.926970
           2  8.077109  7.417790  10.090361  11.289033  10.458097  10.958197
           3  7.983865  7.492862  10.129944  11.418077  10.522366  10.873455
           4  8.053574  7.441354  10.138749  11.297438  10.547018  10.942314
           5  8.044626  7.389152  10.128528  11.328647  10.503515  10.932906
           6  7.966093  7.487729  10.141745  11.406319  10.533793  10.982887
        90 1  8.164200  7.610294   9.828901  11.023355  10.427788  10.902193
           2  8.130045  7.425974   9.972879  11.163064  10.339801  10.840847
           3  8.208286  7.422047   9.754889  11.120437  10.411414  10.935003
           4  7.935916  7.324434   9.808280  10.977562  10.251636  10.694670
           5  8.226764  7.399737   9.760450  11.015849  10.545208  10.911852
           6  8.211148  7.627093   9.841167  11.172223  10.446194  10.801106