使用熊猫将具有匹配索引的多行动态转换为多列

时间:2018-11-21 16:29:46

标签: python pandas pandas-groupby

我需要从中转换以下数据帧:

class_id     instructor_id
 1                10
 2                10
 2                20
 3                30
 3                40
 3                50

对此:

class_id   instructor_id   instructor_id_2   instructor_id_3
 1             10
 2             10                   20
 3             30                   40                       50

唯一的instuctor_id列的数量将根据与每个class_id相关联的讲师ID的数量动态确定。讲师ID列名称将继续使用讲师ID_x的相同模式。

2 个答案:

答案 0 :(得分:2)

groupby apply + listapply + pd.Series用作:

df1 = df.groupby('class_id')['instructor_id'].apply(list).apply(pd.Series)
# alternative df.groupby('class_id')['instructor_id'].apply(lambda x: pd.Series(x.tolist())).unstack()
df1.columns = ['instructor_id']+['instructor_id_'+str(i+1) for i in df1.columns[1:]]
df1.reset_index(inplace=True)

print(df1)
   class_id  instructor_id  instructor_id_2  instructor_id_3
0         1           10.0              NaN              NaN
1         2           10.0             20.0              NaN
2         3           30.0             40.0             50.0

答案 1 :(得分:1)

groupby + cumcount + unstack

以下是使用key帮助程序系列的一种方法:

key = df.groupby('class_id')['instructor_id'].cumcount()\
        .add(1).map('Instructor_{}'.format)

res = df.set_index(['class_id', key]).unstack().reset_index()

# clean up column names
res.columns = res.columns.droplevel(0)
res = res.rename(columns={'': 'class_id'})

print(res)

   class_id  Instructor_1  Instructor_2  Instructor_3
0         1          10.0           NaN           NaN
1         2          10.0          20.0           NaN
2         3          30.0          40.0          50.0