向数据框添加新级别

时间:2018-12-29 09:01:20

标签: python pandas dataframe multi-index

我创建了一个数据框,如下所示:

 
 @Override
    public void configure(AuthorizationServerEndpointsConfigurer endpoints) throws Exception {
        TokenEnhancerChain enhancerChain = new TokenEnhancerChain();
        enhancerChain.setTokenEnhancers(Arrays.asList(customTokenEnhancer(), accessTokenConverter()));

df= 
        id      var0    var1    var2    var3    var4 ...  var137
        5171    10.0    2.8     0.0     5.0     1.0  ...  9.4  
        5171    40.9    2.5     3.4     4.5     1.3  ...  7.7  
        5171    60.7    3.1     5.2     6.6     3.4  ...  1.0
        ...
        5171    0.5     1.3     5.1     0.5     0.2  ...  0.4
        4567    1.5     2.0     1.0     4.5     0.1  ...  0.4  
        4567    4.4     2.0     1.3     6.4     0.1  ...  3.3  
        4567    6.3     3.0     1.5     7.6     1.6  ...  1.6
        ...
        4567    0.7     1.4     1.4     0.3     4.2  ...  1.7
       ... 
        9584    0.3     2.6     0.0     5.2     1.6  ...  9.7  
        9584    0.5     1.2     8.3     3.4     1.3  ...  1.7  
        9584    0.7     3.0     5.6     6.6     3.0  ...  1.0
        ...
        9584    0.7     1.3     0.1     0.0     2.0  ...  1.7

id级有58个项目。我需要添加一个新级别,让调用id uniq_id到此数据帧,以便最终结果如下:

df= 
  uniq_id      id      var0    var1    var2    var3    var4 ...  var137
    0          5171    10.0    2.8     0.0     5.0     1.0  ...  9.4  
    1          5171    40.9    2.5     3.4     4.5     1.3  ...  7.7  
    2          5171    60.7    3.1     5.2     6.6     3.4  ...  1.0
   ...
   57          5171    0.5     1.3     5.1     0.5     0.2  ...  0.4
    0          4567    1.5     2.0     1.0     4.5     0.1  ...  0.4  
    1          4567    4.4     2.0     1.3     6.4     0.1  ...  3.3  
    2          4567    6.3     3.0     1.5     7.6     1.6  ...  1.6
    ...
   57          4567    0.7     1.4     1.4     0.3     4.2  ...  1.7
    ... 
    0          9584    0.3     2.6     0.0     5.2     1.6  ...  9.7  
    1          9584    0.5     1.2     8.3     3.4     1.3  ...  1.7  
    2          9584    0.7     3.0     5.6     6.6     3.0  ...  1.0
    ...
    57         9584    0.7     1.3     0.1     0.0     2.0  ...  1.7

我尝试过:

n_t = range(0,58)
pd.concat([df], keys=n, names=['uniq_id'])

,但这会将uniq_id的所有值加0。我还尝试基于此post创建一个空的multiindex,然后为每个id获取一部分数据帧,并将其添加到此multiIndex中,但是失败了。我该如何解决?

1 个答案:

答案 0 :(得分:2)

您描述了cumulative count

df['uniq_id'] = df.groupby('id').cumcount() 

您可以通过以下方式将其添加到索引中

df.set_index(['id', 'uniq_id']) # If id was a Series

,或者如果id已经是索引:

df.set_index('uniq_id', append=True) # If id was already an Index

这将为您提供MultiIndex: 输出:

    var0  var1  var2  var3  var4  var137
id   uniq_id                                      
5171 0        10.0   2.8   0.0   5.0   1.0     9.4
     1        40.9   2.5   3.4   4.5   1.3     7.7
     2        60.7   3.1   5.2   6.6   3.4     1.0
     3         0.5   1.3   5.1   0.5   0.2     0.4
4567 0         1.5   2.0   1.0   4.5   0.1     0.4
     1         4.4   2.0   1.3   6.4   0.1     3.3
     2         6.3   3.0   1.5   7.6   1.6     1.6
     3         0.7   1.4   1.4   0.3   4.2     1.7
9584 0         0.3   2.6   0.0   5.2   1.6     9.7
     1         0.5   1.2   8.3   3.4   1.3     1.7
     2         0.7   3.0   5.6   6.6   3.0     1.0
     3         0.7   1.3   0.1   0.0   2.0     1.7