我想通过按 Session
对其进行排序来将我的长数据框改造成宽。在本例中,Session
是 1-10。
Session Tube Window Counts Length
0 1 1 1 0.0 0.0
1 1 1 2 0.0 0.0
2 1 1 3 0.0 0.0
3 1 1 4 0.0 0.0
4 1 1 5 0.0 0.0
... ... ... ... ... ...
17995 10 53 36 0.0 0.0
17996 10 53 37 0.0 0.0
17997 10 53 38 0.0 0.0
17998 10 53 39 0.0 0.0
17999 10 53 40 0.0 0.0
我期待的是:
Session Tube Window Counts_1 Length_1 Session Counts_2 Length_2
0 1 1 1 0.0 0.0 0 2 0.0 0.0
1 1 1 2 0.0 0.0 1 2 0.0 0.0
2 1 1 3 0.0 0.0 2 2 0.0 0.0
3 1 1 4 0.0 0.0 3 2 0.0 0.0
4 1 1 5 0.0 0.0 4 2 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ...
17995 10 53 36 0.0 0.0
我找不到解决方案。我的尝试导致了一个完整的广泛数据集。
df['idx'] = df.groupby('Session').cumcount()+1
df = df.pivot_table(index=['Session'], columns='idx',
values=['Counts', 'Length'], aggfunc='first')
df = df.sort_index(axis=1, level=1)
df.columns = [f'{x}_{y}' for x,y in df.columns]
df = df.reset_index()
Session Counts_1 Length_1 Counts_2 Length_2 Counts_3 Length_3 Counts_4 Length_4 Counts_5 Length_5 ... Length_1795 Counts_1796 Length_1796 Counts_1797 Length_1797 Counts_1798 Length_1798 Counts_1799 Length_1799 Counts_1800 Length_1800
0 1 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 ... 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000
1 2 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 ... 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000
2 3 0.0 6.892889 0.0 2.503830 0.0 3.108580 0.0 5.188438 0.0 9.779242 ... 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000
3 4 1.0 12.787159 0.0 13.847412 7.0 44.928269 0.0 48.511435 2.0 33.264356 ... 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000
4 5 0.0 13.345436 2.0 27.415005 20.0 83.130315 19.0 85.475996 2.0 10.147958 ... 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000
5 6 2.0 13.141503 8.0 22.965002 5.0 48.737279 15.0 85.403915 1.0 17.414609 ... 0.000000 6.0 12.399834 0.0 0.710808 0.0 0.000000 0.0 1.661978 0.0 0.000000
6 7 1.0 7.852842 0.0 13.613426 14.0 46.148978 23.0 87.446535 0.0 13.759176 ... 2.231295 8.0 39.022340 1.0 7.304392 3.0 9.228959 0.0 6.885822 0.0 1.606200
7 8 0.0 0.884018 3.0 35.323813 8.0 32.846301 10.0 71.691744 0.0 4.310296 ... 2.753615 6.0 25.003670 6.0 22.113324 0.0 0.615790 0.0 11.812815 2.0 9.991712
8 9 4.0 24.700817 13.0 31.637755 3.0 30.312104 5.0 50.490115 0.0 3.830024 ... 5.977912 11.0 44.305738 1.0 13.523643 0.0 1.374856 1.0 9.066218 1.0 8.376995
9 10 0.0 17.651236 10.0 44.311858 29.0 55.415964 12.0 43.457016 1.0 41.503212 ... 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000
答案 0 :(得分:1)
在为每个会话构建自定义索引后,您可以尝试对数据框进行透视:
df2 = df.assign(index=df.groupby(['Session']).cumcount()).pivot(
'index', 'Session', ['Tube', 'Window', 'Counts', 'Length']).rename_axis(index=None)
有了你的样本数据,它会给出:
Tube Window Counts Length
Session 1 10 1 10 1 10 1 10
0 1.0 53.0 1.0 36.0 0.0 0.0 0.0 0.0
1 1.0 53.0 2.0 37.0 0.0 0.0 0.0 0.0
2 1.0 53.0 3.0 38.0 0.0 0.0 0.0 0.0
3 1.0 53.0 4.0 39.0 0.0 0.0 0.0 0.0
4 1.0 53.0 5.0 40.0 0.0 0.0 0.0 0.0
还不错,但是我们为列设置了 MultiIndex,并且顺序错误。让我们走得更远:
df2.columns = df2.columns.to_flat_index()
df2 = df2.reindex(columns=sorted(df2.columns, key=lambda x: x[1]))
我们现在有:
(Tube, 1) (Window, 1) ... (Counts, 10) (Length, 10)
0 1.0 1.0 ... 0.0 0.0
1 1.0 2.0 ... 0.0 0.0
2 1.0 3.0 ... 0.0 0.0
3 1.0 4.0 ... 0.0 0.0
4 1.0 5.0 ... 0.0 0.0
最后一步:
df2 = df2.rename(columns=lambda x: '_'.join(str(i) for i in x))
最终得到:
Tube_1 Window_1 Counts_1 ... Window_10 Counts_10 Length_10
0 1.0 1.0 0.0 ... 36.0 0.0 0.0
1 1.0 2.0 0.0 ... 37.0 0.0 0.0
2 1.0 3.0 0.0 ... 38.0 0.0 0.0
3 1.0 4.0 0.0 ... 39.0 0.0 0.0
4 1.0 5.0 0.0 ... 40.0 0.0 0.0