熊猫长期重塑几个变量

时间:2021-02-26 15:19:13

标签: python pandas

我想通过按 Session 对其进行排序来将我的长数据框改造成宽。在本例中,Session 是 1-10。

       Session  Tube  Window  Counts  Length
0            1     1       1     0.0     0.0
1            1     1       2     0.0     0.0
2            1     1       3     0.0     0.0
3            1     1       4     0.0     0.0
4            1     1       5     0.0     0.0
...        ...   ...     ...     ...     ...
17995       10    53      36     0.0     0.0
17996       10    53      37     0.0     0.0
17997       10    53      38     0.0     0.0
17998       10    53      39     0.0     0.0
17999       10    53      40     0.0     0.0

我期待的是:

       Session  Tube  Window  Counts_1  Length_1           Session   Counts_2  Length_2
0            1     1       1     0.0     0.0    0                 2      0.0     0.0
1            1     1       2     0.0     0.0    1                 2      0.0     0.0
2            1     1       3     0.0     0.0    2                 2      0.0     0.0
3            1     1       4     0.0     0.0    3                 2      0.0     0.0
4            1     1       5     0.0     0.0    4                 2      0.0     0.0
...        ...   ...     ...     ...     ...    ...           ...   ...     ...     ...     ...
17995       10    53      36     0.0     0.0   

我找不到解决方案。我的尝试导致了一个完整的广泛数据集。

df['idx'] = df.groupby('Session').cumcount()+1
df = df.pivot_table(index=['Session'], columns='idx', 
                    values=['Counts', 'Length'], aggfunc='first')
df = df.sort_index(axis=1, level=1)
df.columns = [f'{x}_{y}' for x,y in df.columns]
df = df.reset_index()

   Session  Counts_1   Length_1  Counts_2   Length_2  Counts_3   Length_3  Counts_4   Length_4  Counts_5   Length_5  ...  Length_1795  Counts_1796  Length_1796  Counts_1797  Length_1797  Counts_1798  Length_1798  Counts_1799  Length_1799  Counts_1800  Length_1800
0        1       0.0   0.000000       0.0   0.000000       0.0   0.000000       0.0   0.000000       0.0   0.000000  ...     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000
1        2       0.0   0.000000       0.0   0.000000       0.0   0.000000       0.0   0.000000       0.0   0.000000  ...     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000
2        3       0.0   6.892889       0.0   2.503830       0.0   3.108580       0.0   5.188438       0.0   9.779242  ...     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000
3        4       1.0  12.787159       0.0  13.847412       7.0  44.928269       0.0  48.511435       2.0  33.264356  ...     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000
4        5       0.0  13.345436       2.0  27.415005      20.0  83.130315      19.0  85.475996       2.0  10.147958  ...     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000
5        6       2.0  13.141503       8.0  22.965002       5.0  48.737279      15.0  85.403915       1.0  17.414609  ...     0.000000          6.0    12.399834          0.0     0.710808          0.0     0.000000          0.0     1.661978          0.0     0.000000
6        7       1.0   7.852842       0.0  13.613426      14.0  46.148978      23.0  87.446535       0.0  13.759176  ...     2.231295          8.0    39.022340          1.0     7.304392          3.0     9.228959          0.0     6.885822          0.0     1.606200
7        8       0.0   0.884018       3.0  35.323813       8.0  32.846301      10.0  71.691744       0.0   4.310296  ...     2.753615          6.0    25.003670          6.0    22.113324          0.0     0.615790          0.0    11.812815          2.0     9.991712
8        9       4.0  24.700817      13.0  31.637755       3.0  30.312104       5.0  50.490115       0.0   3.830024  ...     5.977912         11.0    44.305738          1.0    13.523643          0.0     1.374856          1.0     9.066218          1.0     8.376995
9       10       0.0  17.651236      10.0  44.311858      29.0  55.415964      12.0  43.457016       1.0  41.503212  ...     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000          0.0     0.000000

1 个答案:

答案 0 :(得分:1)

在为每个会话构建自定义索引后,您可以尝试对数据框进行透视:

df2 = df.assign(index=df.groupby(['Session']).cumcount()).pivot(
    'index', 'Session', ['Tube',  'Window', 'Counts', 'Length']).rename_axis(index=None)

有了你的样本数据,它会给出:

        Tube       Window       Counts      Length     
Session   1     10     1     10     1    10     1    10
0        1.0  53.0    1.0  36.0    0.0  0.0    0.0  0.0
1        1.0  53.0    2.0  37.0    0.0  0.0    0.0  0.0
2        1.0  53.0    3.0  38.0    0.0  0.0    0.0  0.0
3        1.0  53.0    4.0  39.0    0.0  0.0    0.0  0.0
4        1.0  53.0    5.0  40.0    0.0  0.0    0.0  0.0

还不错,但是我们为列设置了 MultiIndex,并且顺序错误。让我们走得更远:

df2.columns = df2.columns.to_flat_index()
df2 = df2.reindex(columns=sorted(df2.columns, key=lambda x: x[1]))

我们现在有:

   (Tube, 1)  (Window, 1)  ...  (Counts, 10)  (Length, 10)
0        1.0          1.0  ...           0.0           0.0
1        1.0          2.0  ...           0.0           0.0
2        1.0          3.0  ...           0.0           0.0
3        1.0          4.0  ...           0.0           0.0
4        1.0          5.0  ...           0.0           0.0

最后一步:

df2 = df2.rename(columns=lambda x: '_'.join(str(i) for i in x))

最终得到:

   Tube_1  Window_1  Counts_1  ...  Window_10  Counts_10  Length_10
0     1.0       1.0       0.0  ...       36.0        0.0        0.0
1     1.0       2.0       0.0  ...       37.0        0.0        0.0
2     1.0       3.0       0.0  ...       38.0        0.0        0.0
3     1.0       4.0       0.0  ...       39.0        0.0        0.0
4     1.0       5.0       0.0  ...       40.0        0.0        0.0