使用索引值作为pandas dataframe中的类别值

时间:2017-03-16 06:45:35

标签: python python-3.x pandas dataframe

我有以下数据框:

                  beat1   beat2   beat3   beat4   beat5   beat6   beat7  
filename                                                                  
M40_HC_503d.dat  0.7456  0.8574  0.7695  0.8698  0.8315  0.7908  0.8823   
M30_HC_461d.dat  0.7672  0.6682  0.7452  0.6853  0.7488  0.6782  0.6648   
M24_HC_459d.dat  0.6041  0.6439  0.5870  0.7452  0.6714  0.6684  0.6198   
M48_HC_543d.dat  0.8949  0.8570  0.9338  1.0545  1.0681  1.0775  0.8425   
M40_HC_506d.dat  0.7862  0.8917  0.9357  0.8250  0.8521  0.7146  0.7125

我想创建另一个数据框,其中列名beat1beat7将是索引,它将有两列。在此数据框的第一列中,值将是从beat1beat7的所有值,第二列将是值filename。像这样:

    values   filename
ind   
0   0.7456  M40_HC_503d.dat
1   0.8574  M40_HC_503d.dat
2   0.7695  M40_HC_503d.dat
3   0.8698  M40_HC_503d.dat
4   0.8315  M40_HC_503d.dat
5   0.7908  M40_HC_503d.dat
6   0.8823  M40_HC_503d.dat
7   0.7672  M30_HC_461d.dat
8   0.6682  M30_HC_461d.dat
9   0.7452  M30_HC_461d.dat
10  0.6853  M30_HC_461d.dat
11  0.7488  M30_HC_461d.dat
12  0.6782  M30_HC_461d.dat
13  0.6648  M30_HC_461d.dat

我尝试了很多东西,包括转调等等,但没有任何对我有用的东西。有什么想法吗?

2 个答案:

答案 0 :(得分:2)

我认为你需要stack

df = df.stack().reset_index(0, name='values')
print (df)
              filename  values
beat1  M40_HC_503d.dat  0.7456
beat2  M40_HC_503d.dat  0.8574
beat3  M40_HC_503d.dat  0.7695
beat4  M40_HC_503d.dat  0.8698
beat5  M40_HC_503d.dat  0.8315
beat6  M40_HC_503d.dat  0.7908
beat7  M40_HC_503d.dat  0.8823
beat1  M30_HC_461d.dat  0.7672
beat2  M30_HC_461d.dat  0.6682
beat3  M30_HC_461d.dat  0.7452
beat4  M30_HC_461d.dat  0.6853
beat5  M30_HC_461d.dat  0.7488
beat6  M30_HC_461d.dat  0.6782
...

或者也许:

df = df.stack().reset_index(0, name='values').reset_index(drop=True)
print (df)
           filename  values
0   M40_HC_503d.dat  0.7456
1   M40_HC_503d.dat  0.8574
2   M40_HC_503d.dat  0.7695
3   M40_HC_503d.dat  0.8698
4   M40_HC_503d.dat  0.8315
5   M40_HC_503d.dat  0.7908
6   M40_HC_503d.dat  0.8823
7   M30_HC_461d.dat  0.7672
8   M30_HC_461d.dat  0.6682
9   M30_HC_461d.dat  0.7452
10  M30_HC_461d.dat  0.6853
...
...

如果需要改变指数:

df = df.stack().reset_index(0, name='values')
df.index = df.index.str.extract('(\d+)', expand=False)
print (df)
          filename  values
1  M40_HC_503d.dat  0.7456
2  M40_HC_503d.dat  0.8574
3  M40_HC_503d.dat  0.7695
4  M40_HC_503d.dat  0.8698
5  M40_HC_503d.dat  0.8315
6  M40_HC_503d.dat  0.7908
7  M40_HC_503d.dat  0.8823
1  M30_HC_461d.dat  0.7672
2  M30_HC_461d.dat  0.6682
...
...

答案 1 :(得分:2)

v = df.values
i = df.index.values

pd.DataFrame(
    np.hstack([v.reshape(-1, 1), i.repeat(v.shape[1])[:, None]]),
    columns=['values', 'filename']
)

   values         filename
0  0.7456  M40_HC_503d.dat
1  0.8574  M40_HC_503d.dat
2  0.7695  M40_HC_503d.dat
3  0.8698  M40_HC_503d.dat
4  0.8315  M40_HC_503d.dat
5  0.7908  M40_HC_503d.dat
6  0.8823  M40_HC_503d.dat
7  0.7672  M30_HC_461d.dat
8  0.6682  M30_HC_461d.dat
9  0.7452  M30_HC_461d.dat
...