我想基于一列并以索引为两列来旋转表
数据集:
uid interaction date
1 like 2016-12-04
1 like 2016-12-05
1 comment 2016-12-05
1 like 2016-12-05
2 like 2016-12-04
2 like 2016-12-05
2 comment 2016-12-05
2 like 2016-12-05
使用uid和date我想要特定日期在特定uid上发生的交互次数。
最终结果:
uid like comment date
1 1 0 2016-12-04
1 2 1 2016-12-05
2 1 0 2016-12-04
2 2 1 2016-12-05
我尝试过的方法:
doc_social_interaction.pivot_table(index = ['uid','date'],columns = 'interaction', aggfunc=sum)
答案 0 :(得分:1)
您已经接近,需要GroupBy.size
进行计数:
df1 = df.pivot_table(index=['uid','date'],columns='interaction',aggfunc='size',fill_value=0)
另一种解决方案:
df1 = df.groupby(['uid','date','interaction']).size().unstack(fill_value=0)
df1 = df.groupby(['uid','date'])['interaction'].value_counts().unstack(fill_value=0)
df1 = pd.crosstab([df['uid'],df['date']], df['interaction'])
print (df1)
interaction comment like
uid date
1 2016-12-04 0 1
2016-12-05 1 2
2 2016-12-04 0 1
2016-12-05 1 2
最后一些数据清理:
df1 = df1.reset_index().rename_axis(None, 1)
print (df1)
uid date comment like
0 1 2016-12-04 0 1
1 1 2016-12-05 1 2
2 2 2016-12-04 0 1
3 2 2016-12-05 1 2
答案 1 :(得分:0)
另一种方法:(我的数据框名称为test)
第1步:添加一个常量:
test['constant'] =1
pd.pivot_table(test, index=['uid', 'date'], columns='interaction', values='constant', aggfunc='sum').fillna(0)
interaction comment like
uid date
1 2016-12-04 0.0 1.0
2016-12-05 1.0 2.0
2 2016-12-04 0.0 1.0
2016-12-05 1.0 2.0