Pandas按用户聚合类别计数

时间:2017-03-27 07:18:22

标签: python pandas

如何使用pandas为每个类别的每个用户创建频率计数。我想这样做,所以我可以转向创建一个效用矩阵

|--|**author** | **category**|   
0|  A | movies  
1|  B | games  
2|  C | pics  
4|  A | movies  
5|  C | movies  
6|  B | games 




|--|**author** | **category count**|   

A | movies |2 |  
B | games  |2 |  
C | movies |1 |  
C | pics   |1 | 

1 个答案:

答案 0 :(得分:0)

您可以使用groupbysize一起获取列authorcategory中所有类别的长度 - 输出为Series MultiIndex

print (df.groupby(['author','category']).size())
author  category
A       movies      2
B       games       2
C       movies      1
        pics        1
dtype: int64

然后添加reset_index以便从MultiIndex创建列,并为值列设置列名 - 输出为DataFrame

df = df.groupby(['author','category']).size().reset_index(name='category count')
print (df)
  author category  category count
0      A   movies               2
1      B    games               2
2      C   movies               1
3      C     pics               1

但如果需要crosstab,则有多种解决方案:

#add unstack for reshape
df1 = df.groupby(['author','category']).size().unstack(fill_value=0)
print (df1)
category  games  movies  pics
author                       
A             0       2     0
B             2       0     0
C             0       1     1
df1 = pd.crosstab(df['author'],df['category'])
print (df1)
category  games  movies  pics
author                       
A             0       2     0
B             2       0     0
C             0       1     1
df1 = df.pivot_table(index='author',columns='category', aggfunc='size', fill_value=0)
print (df1)
category  games  movies  pics
author                       
A             0       2     0
B             2       0     0
C             0       1     1

编辑:

What is the difference between size and count in pandas?