我需要从中转换以下数据帧:
class_id instructor_id
1 10
2 10
2 20
3 30
3 40
3 50
对此:
class_id instructor_id instructor_id_2 instructor_id_3
1 10
2 10 20
3 30 40 50
唯一的instuctor_id列的数量将根据与每个class_id相关联的讲师ID的数量动态确定。讲师ID列名称将继续使用讲师ID_x的相同模式。
答案 0 :(得分:2)
将groupby
apply
+ list
和apply
+ pd.Series
用作:
df1 = df.groupby('class_id')['instructor_id'].apply(list).apply(pd.Series)
# alternative df.groupby('class_id')['instructor_id'].apply(lambda x: pd.Series(x.tolist())).unstack()
df1.columns = ['instructor_id']+['instructor_id_'+str(i+1) for i in df1.columns[1:]]
df1.reset_index(inplace=True)
print(df1)
class_id instructor_id instructor_id_2 instructor_id_3
0 1 10.0 NaN NaN
1 2 10.0 20.0 NaN
2 3 30.0 40.0 50.0
答案 1 :(得分:1)
groupby
+ cumcount
+ unstack
以下是使用key
帮助程序系列的一种方法:
key = df.groupby('class_id')['instructor_id'].cumcount()\
.add(1).map('Instructor_{}'.format)
res = df.set_index(['class_id', key]).unstack().reset_index()
# clean up column names
res.columns = res.columns.droplevel(0)
res = res.rename(columns={'': 'class_id'})
print(res)
class_id Instructor_1 Instructor_2 Instructor_3
0 1 10.0 NaN NaN
1 2 10.0 20.0 NaN
2 3 30.0 40.0 50.0