熊猫:多索引列标题

时间:2014-05-21 20:45:51

标签: python pandas indexing

我的时间序列DataFrame df如下所示:

time                               A                    B                    C                      D                E
2011-01-04 15:55:00             0.003452             0.005303             0.016632             0.009611             0.000726 ...
2011-01-04 15:56:00             0.004292             0.064709             0.012159             0.020398             0.000272 ...
2011-01-04 15:57:00             0.006617             0.009344             0.018512             0.029696             0.001235 ...
2011-01-04 15:58:00             0.005883             0.048564             0.020213             0.072104             0.001337 ...
2011-01-04 15:59:00             0.008602             0.047318             0.024568             0.051225             0.002775 ...

我想添加一个multiindex标头来对每列进行分类。因此,例如,我想将A列和B列分组到第1组,将C和D分为第2组,将第3组分为E组。因此,我希望:

    group                              1                    1                    2                     2                 3
    time                               A                    B                    C                     D                 E
    2011-01-04 15:55:00             0.003452             0.005303             0.016632             0.009611             0.000726 ...
    2011-01-04 15:56:00             0.004292             0.064709             0.012159             0.020398             0.000272 ...
    2011-01-04 15:57:00             0.006617             0.009344             0.018512             0.029696             0.001235 ...
    2011-01-04 15:58:00             0.005883             0.048564             0.020213             0.072104             0.001337 ...
    2011-01-04 15:59:00             0.008602             0.047318             0.024568             0.051225             0.002775 ...

如何使用条件语句(如果列=='A'|'B'然后分配给组1等等)?

1 个答案:

答案 0 :(得分:4)

您可以使用MultiIndex.from_tuples方法:

In [8]: df
Out[8]: 
          A         B         C         D         E
0 -1.194136 -0.467446  0.535237 -1.714561  0.599319
1  0.575012 -0.737171  0.287243  0.076106 -0.413430
2  0.635213  1.028926 -2.052527  0.922869  0.543720
3  2.411044 -1.162996  1.392117  0.328242  1.253519
4  1.156001  2.223206  1.664754  0.223262  0.854058
5  0.003120  0.064644 -0.981392 -1.468138 -0.454028
6  0.529241 -0.590786  0.380155 -1.315147  1.669510
7  0.163857  0.888864  0.957516 -0.567799 -0.563448
8  1.978112 -0.744517 -0.487620 -2.629270 -0.896176
9 -0.038746  1.543167  0.511869 -0.080620  0.079988

[10 rows x 5 columns]

In [9]: header = [1, 1, 2, 2, 3, 3]

In [10]: df.columns = pd.MultiIndex.from_tuples(list(zip(header, df.columns)))

In [11]: df
Out[11]: 
          1                   2                   3
          A         B         C         D         E
0 -1.194136 -0.467446  0.535237 -1.714561  0.599319
1  0.575012 -0.737171  0.287243  0.076106 -0.413430
2  0.635213  1.028926 -2.052527  0.922869  0.543720
3  2.411044 -1.162996  1.392117  0.328242  1.253519
4  1.156001  2.223206  1.664754  0.223262  0.854058
5  0.003120  0.064644 -0.981392 -1.468138 -0.454028
6  0.529241 -0.590786  0.380155 -1.315147  1.669510
7  0.163857  0.888864  0.957516 -0.567799 -0.563448
8  1.978112 -0.744517 -0.487620 -2.629270 -0.896176
9 -0.038746  1.543167  0.511869 -0.080620  0.079988

[10 rows x 5 columns]

In [9],我可以通过多种不同的方式创建MultiIndex的顶级。如果您有更多cols,您可以执行np.arange(len(df.columns)).repeat(2)

之类的操作