我有一个大致像这样的数据框
data = [
{'user_id': 1, 'week': 1, 'score': 1},
{'user_id': 1, 'week': 2, 'score': 2},
{'user_id': 1, 'week': 2, 'score': 3},
{'user_id': 2, 'week': 1, 'score': 1},
{'user_id': 2, 'week': 1, 'score': 1}]
df = pd.DataFrame(data)
+---------+------+-------+
| user_id | week | score |
+---------+------+-------+
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 2 | 3 |
| 2 | 1 | 1 |
| 2 | 1 | 1 |
+---------+------+-------+
我想按user_id
和week
对其进行分组,但是然后将每个组中的每个得分都分为一个新列,这样结果数据框架如下所示:
+---------+------+--------+--------+
| user_id | week | score1 | score2 |
+---------+------+--------+--------+
| 1 | 1 | 1 | |
| 1 | 2 | 2 | 3 |
| 2 | 1 | 1 | 1 |
+---------+------+--------+--------+
分组依据很简单
df.groupby(['user_id', 'week'], as_index=False)
但是我看不到如何进行重塑
答案 0 :(得分:3)
您可以将groupby.cumcount()
与assign()
和set_index()
和unstack()
结合使用:
m=(df.assign(k=df.groupby(['user_id','week']).cumcount())
.set_index(['user_id','week','k']).unstack())
m.columns=[f'{a}_{b}' for a,b in m.columns]
print(m.reset_index())
user_id week score_0 score_1
0 1 1 1.0 NaN
1 1 2 2.0 3.0
2 2 1 1.0 1.0
答案 1 :(得分:2)
我们还可以使用groupby
+ apply(list)
和apply(pd.Series)
:
new_df=( df.groupby(['user_id', 'week'])
.score
.apply(list)
.apply(pd.Series)
.add_prefix('score_')
.reset_index() )
print(new_df)
user_id week score_0 score_1
0 1 1 1.0 NaN
1 1 2 2.0 3.0
2 2 1 1.0 1.0