我有一个DataFrame:
u_id date social_interaction_type_id Total_Count
4 2018-08-19 4 5
4 2018-08-24 2 3
4 2018-08-21 1 4
我想根据u_id和日期来旋转DataFrame。
,结果应如下所示:
u_id date 4 2 1
4 2018-08-19 5 nan nan
4 2018-08-24 nan 3 nan
4 2018-08-21 nan nan 4
我的代码尝试:
df.pivot(index = ['u_id','date'] , columns='social_interaction_type_id',values='Total_Count')
错误:
ValueError: Length of passed values is 8803, index implies 1
答案 0 :(得分:2)
df = (df.set_index(['u_id','date','social_interaction_type_id'])['Total_Count']
.unstack()
.reset_index()
.rename_axis(None, axis=1))
print (df)
u_id date 1 2 4
0 4 2018-08-19 NaN NaN 5.0
1 4 2018-08-21 4.0 NaN NaN
2 4 2018-08-24 NaN 3.0 NaN
如果需要在前两列中重复,请使用汇总函数mean
,sum
,例如:
print (df)
u_id date social_interaction_type_id Total_Count
0 4 2018-08-19 4 5 <- 4 2018-08-19
1 4 2018-08-19 6 4 <- 4 2018-08-19
2 4 2018-08-24 2 3
3 4 2018-08-21 1 4
df2 = (df.groupby(['u_id','date','social_interaction_type_id'])['Total_Count']
.mean()
.unstack()
.reset_index()
.rename_axis(None, axis=1))
或者:
df2 = (df.pivot_table(index=['u_id','date'],columns='social_interaction_type_id', values='Total_Count')
.reset_index()
.rename_axis(None, axis=1))
print (df2)
u_id date 1 2 4 6
0 4 2018-08-19 NaN NaN 5.0 4.0
1 4 2018-08-21 4.0 NaN NaN NaN
2 4 2018-08-24 NaN 3.0 NaN NaN
答案 1 :(得分:0)
pd.DataFrame.pivot
,出于我未知的原因,请不要使用index
的值列表。根据文档,可选的index
必须是 string 或 object 。一种解决方法是将pd.DataFrame.pivot_table
与aggfunc='first'
一起使用:
res = df.pivot_table(index=['u_id', 'date'], columns='social_interaction_type_id',
values='Total_Count', aggfunc='first').reset_index()
print(res)
social_interaction_type_id u_id date 1 2 4
0 4 2018-08-19 NaN NaN 5.0
1 4 2018-08-21 4.0 NaN NaN
2 4 2018-08-24 NaN 3.0 NaN