我目前正在处理大型电影数据集,已将其过滤到两列:Genre
和Gender
。
要对此进行可视化:
Genre: Gender:
Romance Male
Tech Male
Romance Male
Comedy Female
Tech Female
Comedy Male
Romance Female
Romance Male
我想显示每个性别中观看次数最多的3个流派,但我似乎找不到正确的代码。
我尝试过的事情:
df_final_gender['name'].groupby(df_final_gender['GENDER']).describe()
仅显示每种性别中观看次数最多(顶部)的类型。我想要每个性别的前3名,感谢您的帮助!
答案 0 :(得分:1)
使用以下DataFrame:
df = pd.DataFrame({
'Genre': ['Romance', 'Tech', 'Romance', 'Comedy', 'Tech', 'Comedy', 'Romance', 'Romance',],
'Gender': ['Male', 'Male', 'Male', 'Female', 'Female', 'Male', 'Female', 'Male',]})
为计数添加额外的列:
df['value'] = 1
这给您:
Genre Gender value
0 Romance Male 1
1 Tech Male 1
2 Romance Male 1
3 Comedy Female 1
4 Tech Female 1
5 Comedy Male 1
6 Romance Female 1
7 Romance Male 1
然后根据流派和性别两个字段进行分组,并获得计数:
counts = df.groupby(['Genre', 'Gender']).count()
输出:
value
Genre Gender
Comedy Female 1
Male 1
Romance Female 1
Male 3
Tech Female 1
Male 1
您可以排序:
sorted = counts.sort_values(by='value', ascending=False)
并绘制:
sorted.plot(kind='bar', figsize=(15,8))
将帮助您
答案 1 :(得分:0)
按列切片,然后运行此函数pd.Series.value_counts()
df["Gender"].value_counts()