使用堆栈函数转换pandas数据帧

时间:2015-06-17 07:02:30

标签: python pandas

我有以下pandas数据框与我

import pandas as pd
import numpy as np
pd.np.random.seed(1)
N = 5
data = pd.DataFrame(pd.np.random.rand(N, 3), columns=['Monday', 'Wednesday', 'Friday'])
data['State'] = 'ST' + pd.Series((pd.np.arange(N) % 19).astype(str))
print data
     Monday  Wednesday    Friday State
0  0.417022   0.720324  0.000114   ST0
1  0.302333   0.146756  0.092339   ST1
2  0.186260   0.345561  0.396767   ST2
3  0.538817   0.419195  0.685220   ST3
4  0.204452   0.878117  0.027388   ST4

我想将此数据框转换为

0   ST0   Monday           0.417022
          Wednesday       0.7203245
          Friday       0.0001143748
1   ST1   Monday          0.3023326
          Wednesday       0.1467559
          Friday         0.09233859
2   ST2   Monday          0.1862602
          Wednesday       0.3455607
          Friday          0.3967675
          State                 ST2
3   ST3   Monday          0.5388167
          Wednesday       0.4191945
          Friday          0.6852195
          State                 ST3
4   ST4   Monday          0.2044522
          Wednesday       0.8781174
          Friday         0.02738759
          State                 ST4

如果单独使用data.stack(),它会提供类似的内容,

0  Monday           0.417022
   Wednesday       0.7203245
   Friday       0.0001143748
   State                 ST0
1  Monday          0.3023326
   Wednesday       0.1467559
   Friday         0.09233859
   State                 ST1
2  Monday          0.1862602
   Wednesday       0.3455607
   Friday          0.3967675
   State                 ST2
3  Monday          0.5388167
   Wednesday       0.4191945
   Friday          0.6852195
   State                 ST3
4  Monday          0.2044522
   Wednesday       0.8781174
   Friday         0.02738759
   State                 ST4

在这里,我如何选择State列作为第一级别,以及多索引中第二级别的其他列。

2 个答案:

答案 0 :(得分:1)

您只需在堆叠前将State列移动到索引中:

data.set_index('State', append=True).stack()
Out[4]: 
   State           
0  ST0    Monday       0.417022
          Wednesday    0.720324
          Friday       0.000114
1  ST1    Monday       0.302333
          Wednesday    0.146756
          Friday       0.092339
2  ST2    Monday       0.186260
          Wednesday    0.345561
          Friday       0.396767
3  ST3    Monday       0.538817
          Wednesday    0.419195
          Friday       0.685220
4  ST4    Monday       0.204452
          Wednesday    0.878117
          Friday       0.027388
dtype: float64

请注意,这与您发布的输出不完全匹配,我没有将状态与日期一起包括在内,因为我认为这样更合理,如果您真的希望它像原始输出那样:{{ 1}}

答案 1 :(得分:1)

您可以在melt列上使用State

In [24]: pd.melt(df, id_vars=['State'])
Out[24]:
   State   variable     value
0    ST0     Monday  0.417022
1    ST1     Monday  0.302333
2    ST2     Monday  0.186260
3    ST3     Monday  0.538817
4    ST4     Monday  0.204452
5    ST0  Wednesday  0.720324
6    ST1  Wednesday  0.146756
7    ST2  Wednesday  0.345561
8    ST3  Wednesday  0.419195
9    ST4  Wednesday  0.878117
10   ST0     Friday  0.000114
11   ST1     Friday  0.092339
12   ST2     Friday  0.396767
13   ST3     Friday  0.685220
14   ST4     Friday  0.027388