Pandas Pivot与multindex结合在一根柱子上

时间:2018-07-02 10:34:14

标签: python pandas

我想基于一列并以索引为两列来旋转表

数据集:

uid     interaction date
1       like        2016-12-04
1       like        2016-12-05
1       comment     2016-12-05
1       like        2016-12-05
2       like        2016-12-04
2       like        2016-12-05
2       comment     2016-12-05
2       like        2016-12-05

使用uid和date我想要特定日期在特定uid上发生的交互次数。

最终结果:

uid     like    comment  date
1       1       0       2016-12-04
1       2       1       2016-12-05
2       1       0       2016-12-04
2       2       1       2016-12-05      

我尝试过的方法:

doc_social_interaction.pivot_table(index = ['uid','date'],columns = 'interaction', aggfunc=sum)

2 个答案:

答案 0 :(得分:1)

您已经接近,需要GroupBy.size进行计数:

df1 = df.pivot_table(index=['uid','date'],columns='interaction',aggfunc='size',fill_value=0)

另一种解决方案:

df1 = df.groupby(['uid','date','interaction']).size().unstack(fill_value=0)

df1 = df.groupby(['uid','date'])['interaction'].value_counts().unstack(fill_value=0)

df1 = pd.crosstab([df['uid'],df['date']], df['interaction'])

print (df1)
interaction     comment  like
uid date                     
1   2016-12-04        0     1
    2016-12-05        1     2
2   2016-12-04        0     1
    2016-12-05        1     2

最后一些数据清理:

df1 = df1.reset_index().rename_axis(None, 1)
print (df1)
   uid        date  comment  like
0    1  2016-12-04        0     1
1    1  2016-12-05        1     2
2    2  2016-12-04        0     1
3    2  2016-12-05        1     2

答案 1 :(得分:0)

另一种方法:(我的数据框名称为test)

第1步:添加一个常量:

test['constant'] =1
pd.pivot_table(test, index=['uid', 'date'], columns='interaction', values='constant', aggfunc='sum').fillna(0)

    interaction     comment     like
uid     date        
1       2016-12-04   0.0        1.0
        2016-12-05   1.0        2.0
2       2016-12-04   0.0        1.0
        2016-12-05   1.0        2.0