Question

我有一个大致像这样的数据框

data = [
    {'user_id': 1, 'week': 1, 'score': 1},
    {'user_id': 1, 'week': 2, 'score': 2},
    {'user_id': 1, 'week': 2, 'score': 3},
    {'user_id': 2, 'week': 1, 'score': 1},
    {'user_id': 2, 'week': 1, 'score': 1}]
df = pd.DataFrame(data)

+---------+------+-------+
| user_id | week | score |
+---------+------+-------+
|       1 |    1 |     1 |
|       1 |    2 |     2 |
|       1 |    2 |     3 |
|       2 |    1 |     1 |
|       2 |    1 |     1 |
+---------+------+-------+

我想按user_id和week对其进行分组，但是然后将每个组中的每个得分都分为一个新列，这样结果数据框架如下所示：

+---------+------+--------+--------+
| user_id | week | score1 | score2 |
+---------+------+--------+--------+
|       1 |    1 |      1 |        |
|       1 |    2 |      2 |      3 |
|       2 |    1 |      1 |      1 |
+---------+------+--------+--------+

分组依据很简单

df.groupby(['user_id', 'week'], as_index=False)

但是我看不到如何进行重塑

Answer 1

您可以将groupby.cumcount()与assign()和set_index()和unstack()结合使用：

m=(df.assign(k=df.groupby(['user_id','week']).cumcount())
                             .set_index(['user_id','week','k']).unstack())
m.columns=[f'{a}_{b}' for a,b in m.columns]
print(m.reset_index())

   user_id  week  score_0  score_1
0        1     1      1.0      NaN
1        1     2      2.0      3.0
2        2     1      1.0      1.0

Answer 2

我们还可以使用groupby + apply(list)和apply(pd.Series)：

new_df=( df.groupby(['user_id', 'week'])
           .score
           .apply(list)
           .apply(pd.Series)
           .add_prefix('score_')
           .reset_index() )
print(new_df)

   user_id  week  score_0  score_1
0        1     1      1.0      NaN
1        1     2      2.0      3.0
2        2     1      1.0      1.0

将熊猫组中的值堆叠到新列中

2 个答案: