我的数据框如下:
ID START END SEQ
1 11 12 1
1 14 15 3
1 13 14 2
2 10 14 1
3 11 15 1
3 16 17 2
我需要将其转换为此DataFrame:
ID START_1 END_1 SEQ_1 START_2 END_2 SEQ_2 START_3 END_3 SEQ_3
1 11 12 1 13 14 2 14 15 3
2 10 14 1 NA NA NA NA NA NA
3 11 15 1 16 17 2 NA NA NA
问题是具有相同ID的行数是未知的,这意味着不应手动定义最大列数START_X
,END_X
,SEQ_X
。
是否有任何自动方式进行此转换,考虑到列应按SEQ
排序?
我应该使用group_by
还是哪种方法?
答案 0 :(得分:1)
您可以将groupby
与unstack
一起使用,然后sort_index
使用MultiIndex
,并在list comprehension
之后的列中删除df['SEQ1'] = df.SEQ
df = df.groupby(['ID','SEQ1']).mean().unstack()
df = df.sort_index(axis=1, level=1)
df.columns = ['_'.join((col[0], str(col[1]))) for col in df.columns]
print (df)
START_1 END_1 SEQ_1 START_2 END_2 SEQ_2 START_3 END_3 SEQ_3
ID
1 11.0 12.0 1.0 13.0 14.0 2.0 14.0 15.0 3.0
2 10.0 14.0 1.0 NaN NaN NaN NaN NaN NaN
3 11.0 15.0 1.0 16.0 17.0 2.0 NaN NaN NaN
:
aggfunc='mean'
默认情况下,使用pivot_table
,df['SEQ1'] = df.SEQ
df = df.pivot_table(index= ['ID','SEQ1']).unstack()
df = df.sort_index(axis=1, level=1)
df.columns = ['_'.join((col[0], str(col[1]))) for col in df.columns]
print (df)
END_1 SEQ_1 START_1 END_2 SEQ_2 START_2 END_3 SEQ_3 START_3
ID
1 12.0 1.0 11.0 14.0 2.0 13.0 15.0 3.0 14.0
2 14.0 1.0 10.0 NaN NaN NaN NaN NaN NaN
3 15.0 1.0 11.0 17.0 2.0 16.0 NaN NaN NaN
的另一个解决方案是:
jars[0]