如何使用pandas为每个类别的每个用户创建频率计数。我想这样做,所以我可以转向创建一个效用矩阵
|--|**author** | **category**|
0| A | movies
1| B | games
2| C | pics
4| A | movies
5| C | movies
6| B | games
|--|**author** | **category count**|
A | movies |2 |
B | games |2 |
C | movies |1 |
C | pics |1 |
答案 0 :(得分:0)
您可以使用groupby
与size
一起获取列author
和category
中所有类别的长度 - 输出为Series
MultiIndex
print (df.groupby(['author','category']).size())
author category
A movies 2
B games 2
C movies 1
pics 1
dtype: int64
然后添加reset_index
以便从MultiIndex
创建列,并为值列设置列名 - 输出为DataFrame
:
df = df.groupby(['author','category']).size().reset_index(name='category count')
print (df)
author category category count
0 A movies 2
1 B games 2
2 C movies 1
3 C pics 1
但如果需要crosstab
,则有多种解决方案:
#add unstack for reshape
df1 = df.groupby(['author','category']).size().unstack(fill_value=0)
print (df1)
category games movies pics
author
A 0 2 0
B 2 0 0
C 0 1 1
df1 = pd.crosstab(df['author'],df['category'])
print (df1)
category games movies pics
author
A 0 2 0
B 2 0 0
C 0 1 1
df1 = df.pivot_table(index='author',columns='category', aggfunc='size', fill_value=0)
print (df1)
category games movies pics
author
A 0 2 0
B 2 0 0
C 0 1 1
编辑: