尝试根据具有两列“类型”和“性别”的数据框找到每个性别的观看次数最高的类型

时间:2019-04-23 15:37:57

标签: python

我目前正在处理大型电影数据集,已将其过滤到两列:GenreGender

要对此进行可视化:

Genre:        Gender:
Romance       Male
Tech          Male
Romance       Male
Comedy        Female
Tech          Female
Comedy        Male
Romance       Female
Romance       Male

我想显示每个性别中观看次数最多的3个流派,但我似乎找不到正确的代码。

我尝试过的事情:

df_final_gender['name'].groupby(df_final_gender['GENDER']).describe()

仅显示每种性别中观看次数最多(顶部)的类型。我想要每个性别的前3名,感谢您的帮助!

2 个答案:

答案 0 :(得分:1)

使用以下DataFrame:

df = pd.DataFrame({
    'Genre': ['Romance', 'Tech', 'Romance', 'Comedy', 'Tech', 'Comedy', 'Romance', 'Romance',], 
    'Gender': ['Male', 'Male', 'Male', 'Female', 'Female', 'Male', 'Female', 'Male',]})

为计数添加额外的列:

df['value'] = 1

这给您:

    Genre   Gender  value
0   Romance Male    1
1   Tech    Male    1
2   Romance Male    1
3   Comedy  Female  1
4   Tech    Female  1
5   Comedy  Male    1
6   Romance Female  1
7   Romance Male    1

然后根据流派和性别两个字段进行分组,并获得计数:

counts = df.groupby(['Genre', 'Gender']).count()

输出:

                value
Genre   Gender  
Comedy  Female  1
        Male    1
Romance Female  1
        Male    3
Tech    Female  1
        Male    1

您可以排序:

sorted = counts.sort_values(by='value', ascending=False)

并绘制:

sorted.plot(kind='bar', figsize=(15,8))

将帮助您

enter image description here

答案 1 :(得分:0)

按列切片,然后运行此函数pd.Series.value_counts() df["Gender"].value_counts()