我有以下数据框:
beat1 beat2 beat3 beat4 beat5 beat6 beat7
filename
M40_HC_503d.dat 0.7456 0.8574 0.7695 0.8698 0.8315 0.7908 0.8823
M30_HC_461d.dat 0.7672 0.6682 0.7452 0.6853 0.7488 0.6782 0.6648
M24_HC_459d.dat 0.6041 0.6439 0.5870 0.7452 0.6714 0.6684 0.6198
M48_HC_543d.dat 0.8949 0.8570 0.9338 1.0545 1.0681 1.0775 0.8425
M40_HC_506d.dat 0.7862 0.8917 0.9357 0.8250 0.8521 0.7146 0.7125
我想创建另一个数据框,其中列名beat1
到beat7
将是索引,它将有两列。在此数据框的第一列中,值将是从beat1
到beat7
的所有值,第二列将是值filename
。像这样:
values filename
ind
0 0.7456 M40_HC_503d.dat
1 0.8574 M40_HC_503d.dat
2 0.7695 M40_HC_503d.dat
3 0.8698 M40_HC_503d.dat
4 0.8315 M40_HC_503d.dat
5 0.7908 M40_HC_503d.dat
6 0.8823 M40_HC_503d.dat
7 0.7672 M30_HC_461d.dat
8 0.6682 M30_HC_461d.dat
9 0.7452 M30_HC_461d.dat
10 0.6853 M30_HC_461d.dat
11 0.7488 M30_HC_461d.dat
12 0.6782 M30_HC_461d.dat
13 0.6648 M30_HC_461d.dat
我尝试了很多东西,包括转调等等,但没有任何对我有用的东西。有什么想法吗?
答案 0 :(得分:2)
我认为你需要stack
:
df = df.stack().reset_index(0, name='values')
print (df)
filename values
beat1 M40_HC_503d.dat 0.7456
beat2 M40_HC_503d.dat 0.8574
beat3 M40_HC_503d.dat 0.7695
beat4 M40_HC_503d.dat 0.8698
beat5 M40_HC_503d.dat 0.8315
beat6 M40_HC_503d.dat 0.7908
beat7 M40_HC_503d.dat 0.8823
beat1 M30_HC_461d.dat 0.7672
beat2 M30_HC_461d.dat 0.6682
beat3 M30_HC_461d.dat 0.7452
beat4 M30_HC_461d.dat 0.6853
beat5 M30_HC_461d.dat 0.7488
beat6 M30_HC_461d.dat 0.6782
...
或者也许:
df = df.stack().reset_index(0, name='values').reset_index(drop=True)
print (df)
filename values
0 M40_HC_503d.dat 0.7456
1 M40_HC_503d.dat 0.8574
2 M40_HC_503d.dat 0.7695
3 M40_HC_503d.dat 0.8698
4 M40_HC_503d.dat 0.8315
5 M40_HC_503d.dat 0.7908
6 M40_HC_503d.dat 0.8823
7 M30_HC_461d.dat 0.7672
8 M30_HC_461d.dat 0.6682
9 M30_HC_461d.dat 0.7452
10 M30_HC_461d.dat 0.6853
...
...
如果需要改变指数:
df = df.stack().reset_index(0, name='values')
df.index = df.index.str.extract('(\d+)', expand=False)
print (df)
filename values
1 M40_HC_503d.dat 0.7456
2 M40_HC_503d.dat 0.8574
3 M40_HC_503d.dat 0.7695
4 M40_HC_503d.dat 0.8698
5 M40_HC_503d.dat 0.8315
6 M40_HC_503d.dat 0.7908
7 M40_HC_503d.dat 0.8823
1 M30_HC_461d.dat 0.7672
2 M30_HC_461d.dat 0.6682
...
...
答案 1 :(得分:2)
v = df.values
i = df.index.values
pd.DataFrame(
np.hstack([v.reshape(-1, 1), i.repeat(v.shape[1])[:, None]]),
columns=['values', 'filename']
)
values filename
0 0.7456 M40_HC_503d.dat
1 0.8574 M40_HC_503d.dat
2 0.7695 M40_HC_503d.dat
3 0.8698 M40_HC_503d.dat
4 0.8315 M40_HC_503d.dat
5 0.7908 M40_HC_503d.dat
6 0.8823 M40_HC_503d.dat
7 0.7672 M30_HC_461d.dat
8 0.6682 M30_HC_461d.dat
9 0.7452 M30_HC_461d.dat
...