如何找到熊猫中分组重复次数最多的元素

时间:2018-10-05 15:20:17

标签: pandas pandas-groupby

我有一个按年份和流派分组的数据框架

new_df=df1.groupby(['release_year','genres'])['release_year','genres','budget_adj']

new_df.info()

我当前的表格就像

                    release_year    genres  budget_adj
                    count   count   count
release_year    genres          
1960            Action  7   7   7
              Adventure 5   5   5
                Comedy  7   7   7
                Crime   2   2   2
1961            Action  7   7   7
              Adventure 6   6   6
              Animation 1   1   1
              Comedy    8   8   8

以此类推

我想找到每年最多制作的流派,我如何为此编写熊猫查询?

2 个答案:

答案 0 :(得分:0)

您可以这样:

df.loc[df.loc[:,('genres','count')].groupby(level=0)
                                   .rank(ascending=False, method='dense')
                                   .loc[lambda x: x==1].index]

输出:

            release_year genres budget_adt
                   count  count      count
1960 Action            7      7          7
     Comedy            7      7          7
1961 Comedy            8      8          8

答案 1 :(得分:0)

df.columns = df.columns.map('_'.join)
df.reset_index().groupby(['genres']).apply(lambda x: x[x.genres_count == x.genres_count.max()])

出局:

                Brand   Metric   release_year_count genres_count    budget_adj_count
Brand                       
1960      0      1960   Action  7   7   7
          2      1960   Comedy  7   7   7
1961      7      1961   Crime   8   8   8