熊猫将行转换为列

时间:2021-07-14 11:31:00

标签: python pandas dataframe

我有一个如下所示的 Pandas 数据框:

<头>
index p1 a1 阶段 文件编号 e1
388 19.288 21.630 0.0 0 0.0
389 40.910 71.489 1.0 0 0.0
390 31.310 43.952 2.0 0 0.0
391 28.420 30.250 3.0 0 0.0
392 17.940 22.000 0.0 1 0.0
393 38.020 68.750 1.0 1 0.0
394 31.230 48.352 2.0 1 1.0
395 26.902 29.880 3.0 1 0.0

我们可以使用此代码创建它

d = {'p1': {388: 19.288,389: 40.91,390: 31.31,391: 28.42,392: 17.94,393: 38.02,394: 31.23,395: 26.902},
     'a1': {388: 21.63,389: 71.489,390: 43.952,391: 30.25,392: 22.0,393: 68.75,394: 48.352,395: 29.88},
     'phase': {388: 0.0,389: 1.0,390: 2.0,391: 3.0,392: 0.0,393: 1.0,394: 2.0,395: 3.0},
     'file_number': {388: 0, 389: 0, 390: 0, 391: 0, 392: 1, 393: 1, 394: 1, 395: 1},
     'e1': {388: 0.0,389: 0.0,390: 0.0,391: 0.0,392: 0.0,393: 1.0,394: 0.0,395: 0.0}}

df = pd.DataFrame(d)

因为我想转换这个数据框,所以每个文件编号都有 1 行。并根据阶段对其进行转换 - 基本上为每个 file_number 将多行折叠为一行。阶段编号将始终为 0、1、2、3。最终表应如下所示:

<头>
p1_0 p1_1 p1_2 p1_3 a1_0 p1_1 a1_2 a1_3 e1_0 e1_1 e1_2 e1_3
19.288 40.910 31.310 28.420 21.630 71.489 43.952 30.250 0 0 0 0
17.940 38.020 31.230 26.902 22.000 68.750 48.352 29.880 0 0 1 0

其中后缀表示 p1_phase、a1_phase 等。

我想尽快完成。由于我的数据非常大,我宁愿避免循环。

1 个答案:

答案 0 :(得分:1)

d = {'p1': {388: 19.288,389: 40.91,390: 31.31,391: 28.42,392: 17.94,393: 38.02,394: 31.23,395: 26.902},
     'a1': {388: 21.63,389: 71.489,390: 43.952,391: 30.25,392: 22.0,393: 68.75,394: 48.352,395: 29.88},
     'phase': {388: 0.0,389: 1.0,390: 2.0,391: 3.0,392: 0.0,393: 1.0,394: 2.0,395: 3.0},
     'file_number': {388: 0, 389: 0, 390: 0, 391: 0, 392: 1, 393: 1, 394: 1, 395: 1},
     'e1': {388: 0.0,389: 0.0,390: 0.0,391: 0.0,392: 0.0,393: 1.0,394: 0.0,395: 0.0}}

df = pd.DataFrame(d)
# pivot the data
pivoted = df.pivot(index='file_number', columns='phase')
# flatten the columns
pivoted.columns = [f'{col[0]}_{int(col[1])}'  for col in pivoted.columns.values]

在此 pivoted 之后是具有所需形状的数据框。

基本上是这两个问题的组合: