pandas groupby在最终结果中包含一列

时间:2018-09-04 14:37:55

标签: python pandas dataframe pandas-groupby

        cast                year    revenue         title
id              
135397  Chris Pratt         2015    1.392446e+09    Jurassic World
135397  Bryce Dallas Howard 2015    1.392446e+09    Jurassic World
135397  Irrfan Khan         2015    1.392446e+09    Jurassic World
135397  Nick Robinson       2015    1.392446e+09    Jurassic World  

鉴于上述DataFrame,我想找到每年收入最高的演员(基于他们当年上映的电影的总收入)。这是我到目前为止所拥有的:

#get the total revenue associated with each cast for each year
f ={'revenue':sum}
#revenue by year for each cast
df_actor_yr = df_actor_yr.groupby(['year', 'cast']).agg(f)
df_actor_yr
year    cast    
1960    Anthony Perkins     2.359350e+08
        Charles Laughton    4.423780e+08
        Fred MacMurray      1.843242e+08
        Jack Kruschen       1.843242e+08
        Jean Simmons        4.423780e+08
        John Gavin          2.359350e+08
        Kirk Douglas        4.423780e+08
        Vera Miles          2.359350e+08
1961    Anthony Quayle      2.108215e+08
        Anthony Quinn       2.108215e+08
        Ben Wright          1.574815e+09
        Betty Lou Gerson    1.574815e+09
        ...

接下来要获得每年收入最高的演员,我做了以下

df_actor_yr.reset_index(inplace=True)
g ={"revenue" : max }
df_actor_yr = df_actor_yr.groupby('year').agg(g)

df_actor_yr   

        revenue
year    
1960    4.423780e+08
1961    1.574815e+09
1962    5.045914e+08
1963    5.617734e+08
1964    8.780804e+08
1965    1.129535e+09
1967    1.345551e+09
1968    4.187094e+08
1969    6.081511e+08
...

这只会给我年份和当年的最高收入,我也想获得与该收入相关的演员的相应名称。我该怎么做?

1 个答案:

答案 0 :(得分:1)

您可以将逻辑分为两个步骤。使用GroupBy + sum按年份和年份进行的第一个总和。然后使用GroupBy + idxmax查找每年的最大收入:

# sum by cast and year
df_summed = df.groupby(['cast', 'year'])['revenue'].sum().reset_index()

# maximums by year
res = df_summed.loc[df_summed.groupby('year')['revenue'].idxmax()]

print(res)

                cast  year       revenue
3       NickRobinson  2012  3.401340e+09
0  BryceDallasHoward  2015  1.568978e+09

对于上面的输出,我使用了更多有趣的数据:

id      cast               year    revenue         title
135397  ChrisPratt         2015    1.392446e+09    JurassicWorld
135397  BryceDallasHoward  2015    1.568978e+09    SomeMovie
135397  IrrfanKhan         2012    1.392446e+09    JurassicWorld
135397  NickRobinson       2012    1.046987e+09    JurassicWorld  
135398  NickRobinson       2012    2.354353e+09    SomeOtherMovie