我的时间序列DataFrame df
如下所示:
time A B C D E
2011-01-04 15:55:00 0.003452 0.005303 0.016632 0.009611 0.000726 ...
2011-01-04 15:56:00 0.004292 0.064709 0.012159 0.020398 0.000272 ...
2011-01-04 15:57:00 0.006617 0.009344 0.018512 0.029696 0.001235 ...
2011-01-04 15:58:00 0.005883 0.048564 0.020213 0.072104 0.001337 ...
2011-01-04 15:59:00 0.008602 0.047318 0.024568 0.051225 0.002775 ...
我想添加一个multiindex标头来对每列进行分类。因此,例如,我想将A列和B列分组到第1组,将C和D分为第2组,将第3组分为E组。因此,我希望:
group 1 1 2 2 3
time A B C D E
2011-01-04 15:55:00 0.003452 0.005303 0.016632 0.009611 0.000726 ...
2011-01-04 15:56:00 0.004292 0.064709 0.012159 0.020398 0.000272 ...
2011-01-04 15:57:00 0.006617 0.009344 0.018512 0.029696 0.001235 ...
2011-01-04 15:58:00 0.005883 0.048564 0.020213 0.072104 0.001337 ...
2011-01-04 15:59:00 0.008602 0.047318 0.024568 0.051225 0.002775 ...
如何使用条件语句(如果列=='A'|'B'然后分配给组1等等)?
答案 0 :(得分:4)
您可以使用MultiIndex.from_tuples
方法:
In [8]: df
Out[8]:
A B C D E
0 -1.194136 -0.467446 0.535237 -1.714561 0.599319
1 0.575012 -0.737171 0.287243 0.076106 -0.413430
2 0.635213 1.028926 -2.052527 0.922869 0.543720
3 2.411044 -1.162996 1.392117 0.328242 1.253519
4 1.156001 2.223206 1.664754 0.223262 0.854058
5 0.003120 0.064644 -0.981392 -1.468138 -0.454028
6 0.529241 -0.590786 0.380155 -1.315147 1.669510
7 0.163857 0.888864 0.957516 -0.567799 -0.563448
8 1.978112 -0.744517 -0.487620 -2.629270 -0.896176
9 -0.038746 1.543167 0.511869 -0.080620 0.079988
[10 rows x 5 columns]
In [9]: header = [1, 1, 2, 2, 3, 3]
In [10]: df.columns = pd.MultiIndex.from_tuples(list(zip(header, df.columns)))
In [11]: df
Out[11]:
1 2 3
A B C D E
0 -1.194136 -0.467446 0.535237 -1.714561 0.599319
1 0.575012 -0.737171 0.287243 0.076106 -0.413430
2 0.635213 1.028926 -2.052527 0.922869 0.543720
3 2.411044 -1.162996 1.392117 0.328242 1.253519
4 1.156001 2.223206 1.664754 0.223262 0.854058
5 0.003120 0.064644 -0.981392 -1.468138 -0.454028
6 0.529241 -0.590786 0.380155 -1.315147 1.669510
7 0.163857 0.888864 0.957516 -0.567799 -0.563448
8 1.978112 -0.744517 -0.487620 -2.629270 -0.896176
9 -0.038746 1.543167 0.511869 -0.080620 0.079988
[10 rows x 5 columns]
In [9]
,我可以通过多种不同的方式创建MultiIndex的顶级。如果您有更多cols,您可以执行np.arange(len(df.columns)).repeat(2)