我有一个如下所示的 Pandas 数据框:
index | p1 | a1 | 阶段 | 文件编号 | e1 |
---|---|---|---|---|---|
388 | 19.288 | 21.630 | 0.0 | 0 | 0.0 |
389 | 40.910 | 71.489 | 1.0 | 0 | 0.0 |
390 | 31.310 | 43.952 | 2.0 | 0 | 0.0 |
391 | 28.420 | 30.250 | 3.0 | 0 | 0.0 |
392 | 17.940 | 22.000 | 0.0 | 1 | 0.0 |
393 | 38.020 | 68.750 | 1.0 | 1 | 0.0 |
394 | 31.230 | 48.352 | 2.0 | 1 | 1.0 |
395 | 26.902 | 29.880 | 3.0 | 1 | 0.0 |
我们可以使用此代码创建它
d = {'p1': {388: 19.288,389: 40.91,390: 31.31,391: 28.42,392: 17.94,393: 38.02,394: 31.23,395: 26.902},
'a1': {388: 21.63,389: 71.489,390: 43.952,391: 30.25,392: 22.0,393: 68.75,394: 48.352,395: 29.88},
'phase': {388: 0.0,389: 1.0,390: 2.0,391: 3.0,392: 0.0,393: 1.0,394: 2.0,395: 3.0},
'file_number': {388: 0, 389: 0, 390: 0, 391: 0, 392: 1, 393: 1, 394: 1, 395: 1},
'e1': {388: 0.0,389: 0.0,390: 0.0,391: 0.0,392: 0.0,393: 1.0,394: 0.0,395: 0.0}}
df = pd.DataFrame(d)
因为我想转换这个数据框,所以每个文件编号都有 1 行。并根据阶段对其进行转换 - 基本上为每个 file_number 将多行折叠为一行。阶段编号将始终为 0、1、2、3。最终表应如下所示:
p1_0 | p1_1 | p1_2 | p1_3 | a1_0 | p1_1 | a1_2 | a1_3 | e1_0 | e1_1 | e1_2 | e1_3 |
---|---|---|---|---|---|---|---|---|---|---|---|
19.288 | 40.910 | 31.310 | 28.420 | 21.630 | 71.489 | 43.952 | 30.250 | 0 | 0 | 0 | 0 |
17.940 | 38.020 | 31.230 | 26.902 | 22.000 | 68.750 | 48.352 | 29.880 | 0 | 0 | 1 | 0 |
其中后缀表示 p1_phase、a1_phase 等。
我想尽快完成。由于我的数据非常大,我宁愿避免循环。
答案 0 :(得分:1)
d = {'p1': {388: 19.288,389: 40.91,390: 31.31,391: 28.42,392: 17.94,393: 38.02,394: 31.23,395: 26.902},
'a1': {388: 21.63,389: 71.489,390: 43.952,391: 30.25,392: 22.0,393: 68.75,394: 48.352,395: 29.88},
'phase': {388: 0.0,389: 1.0,390: 2.0,391: 3.0,392: 0.0,393: 1.0,394: 2.0,395: 3.0},
'file_number': {388: 0, 389: 0, 390: 0, 391: 0, 392: 1, 393: 1, 394: 1, 395: 1},
'e1': {388: 0.0,389: 0.0,390: 0.0,391: 0.0,392: 0.0,393: 1.0,394: 0.0,395: 0.0}}
df = pd.DataFrame(d)
# pivot the data
pivoted = df.pivot(index='file_number', columns='phase')
# flatten the columns
pivoted.columns = [f'{col[0]}_{int(col[1])}' for col in pivoted.columns.values]
在此 pivoted
之后是具有所需形状的数据框。
基本上是这两个问题的组合: